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PREFACE TO THE SECOND EDITION 


The book is a self-contained introduction into elementary probability theory and 
stochastic processes with special emphasis on their applications in science, engineer- 
ing, finance, computer science and operations research. It provides theoretical founda- 
tions for modeling time-dependent random phenomena in these areas and illustrates 
their application through the analysis of numerous, practically relevant examples. As 
a non-measure theoretic text, the material is presented in a comprehensible, applica- 
tion-oriented way. Its study only assumes a mathematical maturity which students of 
applied sciences acquire during their undergraduate studies in mathematics. The study 
of stochastic processes and its fundament, probability theory, as of any other mathe- 
matically based science, requires less routine effort, but more creative work on one's 
own. Therefore, numerous exercises have been added to enable readers to assess to 
which extent they have grasped the subject. Solutions to many of the exercises can 
be downloaded from the website of the Publishers or the exercises are given together 
with their solutions. A complete solutions manual is available to instructors from the 
Publishers. To make the book attractive to theoretically interested readers as well, 
some important proofs and challenging examples and exercises have been included. 
‘Starred' exercises belong to this category. The chapters are organized in such a way 
that reading a chapter usually requires knowledge of some of the previous ones. The 
book has been developed in part as a course text for undergraduates and for 
self-study by non-statisticians. Some sections may also serve as a basis for pre- 
paring senior undergraduate courses. 

The text is a thoroughly revised and supplemented version of the first edition so that 
it is to a large extent a new book: The part on probability theory has been completely 
rewritten and more than doubled. Several new sections have been included in the part 
about stochastic processes as well: Time series analysis, random walks, branching 
processes, and spectral analysis of stationary stochastic processes. Theoretically more 
challenging sections have been deleted and mainly replaced with a comprehensive 
numerical discussion of examples. All in all, the volume of the book has increased by 
about a third. 

This book does not extensively deal with data analysis aspects in probability and sto- 
chastic processes. But sometimes connections between probabilistic concepts and the 
corresponding statistical approaches are established to facilitate the understanding. 
The author has no doubt the book will help students to pass their exams and practi- 
cians to apply stochastic modeling in their own fields of expertise. 


The author is thankful for the constructive feedback from many readers of the first 
edition. Helpful comments to the second edition are very welcome as well and should 
be directed to: Frank.Beichelt@wits.ac.za. 


Johannesburg, March 2016 Frank Beichelt 
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SYMBOLS AND ABBREVIATIONS 


Om @ symbols after an example, a theorem, a definition 

fO=ec f()=c for all ¢t being element of the domain of definition of f 

fg convolution of two functions f and g 

fo nth convolution power of f 

f(s), Lif,s} Laplace transform of a function f 

o(x) Landau order symbol 

83; Kronecker symbol 

Probability Theory 

X, Y, Z random variables 

E(X), Var(X) mean (expected) value of X, variance of X 

Ix(x), Fx) probability density function, (cumulative probability) distribution 
function of X 

Fy(y|x), fy(vlx) conditional distribution function, density of Y given X =x 

Xt, F(x) residual lifetime of a system of age ¢, distribution function of X; 

E(Y\|x) conditional mean value of Y given X= x 

A(x), A(x) failure rate, integrated failure rate (hazard function) 

Nw, 0”) normally distributed random variable (normal distribution) with 
mean value u and variance o? 

o(x), DB) probability density function, distribution function of a standard 


Sx(%1,%2, ee Xn) 
FY (*1,X2,..5%n) 


Cov(X, Y), p(X, Y) 
Mz) 


normal random variable M0, 1) 

joint probability density function of X = (X1,X9,...,Xn) 

joint distribution function of X = (X1,X9,...,Xn) 

covariance, correlation coefficient between_XY and Y 
z-transform (moment generating function) of a discrete random 
variable or of its probability distribution, respectively 


Stochastic Processes 
{X(0), t€ T}, {X1, t € T} continuous-time, discrete-time stochastic process with 


Z 
Fix), Fix) 


Sii,to geen ty(X] 5X25 0 


m(t) 
C(s,) 
C(t) 


parameter space T 

state space of a stochastic process 

probability density, distribution function of X(4) 

Xn)y Pty toyecstn(X 15X25 ++ Xn) 

joint density, distribution function of (X(t;), X(t), ...,X(tn)) 
trend function of a stochastic process 

covariance function of a stochastic process 

covariance function of a stationary stochastic process 


C(t), {C(t), t= 0} compound random variable, compound stochastic process 


p(s.) 
{T1,T,...$ 
{Y1, Yo,...} 
N 

{N(t), t= 0} 
Ms,f) 

A(t), H(t) 
A(t) 

Bt) 

R(t), {R, t = 0} 
A, A(t) 


Pip py 


PijDs Vij» Vi 


{ti;i¢€ Z} 
TO 

Aye Hy 

A, Hs P 

Hj 

u 

W 

L 

L(x) 

L(a,b) 
{B(O), t= 0} 
2, oO 

{S(0), t= 0} 


{B(t), 0<t< 1} 
{D(0), t = 0} 
M(t) 

M 

{U(t), t= 0} 

oO, W 


5(@), S(@) 


correlation function of a stochastic process 

random point process 

sequence of interarrival times, renewal process 
integer-valued random variable, discrete stopping time 
(random) counting process 

increment of a counting process in (s, ¢] 

renewal function of an ordinary, delayed renewal processs 
forward recurrence time, point availability 

backward recurrence time 

risk reserve, risk reserve process 

stationary (long-run) availability, point availability 

one-step, -step transition probabilities of a homogeneous, 
discrete-time Markov chain 

transition probabilities; conditional, unconditional transition rates 
of a homogeneous, continuous-time Markov chain 
stationary state distribution of a homogeneous Markov chain 
extinction probability, vacant probability (sections 8.5, 9.7) 
birth, death rates 


arrival rate, service rate, traffic intensity /u (in queueing models) 
mean sojourn time of a semi-Markov process in state i 

drift parameter of a Brownian motion process with drift 

waiting time in a queueing system 

lifetime, cycle length, queue length, continuous stopping time 
first-passage time with regard to level x 

first-passage time with regard to level min(a, b) 

Brownian motion (process) 

o* = Var(B(1)) variance parameter, volatility 


seasonal component of a time series (section 6.4), standardized 
Brownian motion (chapter 11). 

Brownian bridge 

Brownian motion with drift 

absolute maximum of the Brownian motion (with drift) in [0, ¢] 
absolute maximum of the Brownian motion (with drift) in [0, 0) 
Ornstein-Uhlenbeck process, integrated Brownian motion process 
circular frequency, bandwidth 

spectral density, spectral function (chapter 12) 


Introduction 


Is the world a well-ordered entirety, 
or arandom mixture, 
which nevertheless is called world-order? 


Marc Aurel 


Random influences or phenomena occur everywhere in nature and social life. Their 
consideration is an indispensable requirement for being successful in natural, econ- 
omical, social, and engineering sciences. Random influences partially or fully contri- 
bute to the variability of parameters like wind velocity, rainfall intensity, electromag- 
netic noise levels, fluctuations of share prices, failure time points of technical units, 
timely occurrences of births and deaths in biological populations, of earthquakes, or 
of arrivals of customers at service centers. Random influences induce random events. 
An event is called random if on given conditions it can occur or not. For instance, 
the events that during a thunderstorm a certain house will be struck by lightning, a 
child will reach adulthood, at least one shooting star appears in a specified time 
interval, a production process comes to a standstill for lack of material, a cancer 
patient survives chemotherapy by 5 years are random. Border cases of random events 
are the deterministic events, namely the certain event and the impossible event. On 
given conditions, a deterministic (impossible) event will always (never) occur. For 
instance, it is absolutely sure that lead, when heated to a temperature of over 

327.5°C will become liquid, but that lead during the heating process will turn to 
gold is an impossible event. Random is the shape, liquid lead assumes if poured on an 
even steel plate, and random is also the occurrence of events which are predicted from 
the form of these castings to the future. Even if the reader is not a lottery, card, or 
dice player, she/he will be confronted in her/his daily routine with random influences 
and must take into account their implications: When your old coffee machine fails 
after an unpredictable number of days, you go to the supermarket and pick a new one 
from the machines of your favorite brand. At home, when trying to make your first 
cup of coffee, you realize that you belong to the few unlucky ones who picked by 
chance a faulty machine. A car driver, when estimating the length of the trip to his 
destination, has to take into account that his vehicle may start only with delay, that a 
traffic jam could slow down the progress, and that scarce parking opportunities may 
cause further delay. Also, at the end of a year the overwhelming majority of the car 
drivers realize that having taken out a policy has only enriched the insurance compa- 
ny. Nevertheless, they will renew their policy because people tend to prefer moderate 
regular cost, even if they arise long-term, to the risk of larger unscheduled cost. 
Hence it is not surprising that insurance companies belonged to the first institutions 
that had a direct practical interest in making use of methods for the quantitative 
evaluation of random influences and gave in turn important impulses for the develop- 
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ment of such methods. It is the probability theory, which provides the necessary 
mathematical tools for their work. 


Probability theory deals with the investigation of regularities random events are 
subjected to. 


The existence of such statistical or stochastic regularities may come as a surprise to 
philosophically less educated readers, since at first glance it seems to be paradoxic- 
al to combine regularity and randomness. But even without philosophy and without 
probability theory, some simple regularities can already be illustrated at this stage: 


1) When throwing a fair die once, then one of the integers from 1 to 6 will appear 
and no regularity can be observed. But if a die is thrown repeatedly, then the fraction 
of throws with outcome 1, say, will tend to 1/6, and with increasing number of throws 
this fraction will converge to the value 1/6. (A die is called fair if each integer has 
the same chance to appear.) 

2) If a specific atom of a radioactive substance is observed, then the time from the 
beginning of its observation to its disintegration cannot be predicted with certainty, 
i.e., this time is random. On the other hand, one knows the half-life period of a radio- 
active substance, i.e., one can predict with absolute certainty after which time from 
say originally 10 gram (trillions of atoms) of the substance exactly 5 gram is left. 

3) Random influences can also take effect by superimposing purely deterministic 
processes. A simple example is the measurement of a physical parameter, e.g., the 
temperature. There is nothing random about this parameter when it refers to a spe- 
cific location at a specific time. However, when this parameter has to be measured 
with sufficiently high accuracy, then, even under always the same measurement 
conditions, different measurements will usually show different values. This is, e.g., 
due to the degree of inaccuracy, which is inherent to every measuring method, and to 
subjective moments. A statistical regularity in this situation is that with increasing 
number of measurements, which are carried out independently and are not biased by 
systematic errors, the arithmetic mean of these measurements converges towards the 
true temperature. 

4) Consider the movement of a tiny particle in a container filled with a liquid. It 
moves along zig-zag paths in an apparently chaotic motion. This motion is generated 
by the huge number of impacts the particle is exposed to with surrounding molecules 
of the fluid. Under average conditions, there are about 107! collisions per second 
between particle and molecules. Hence, a deterministic approach to modeling the 
motion of particles in a fluid is impossible. This movement has to be dealt with as a 
random phenomenon. But the pressure within the container generated by the vast 
number of impacts of fluid molecules with the sidewalls of the container is constant. 


Examples | to 4 show the nature of a large class of statistical regularities: 


The superposition of a large number of random influences leads under certain 
conditions to deterministic phenomena. 
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Deterministic regularities (law of falling bodies, spreading of waves, Ohm's law, 
chemical reactions, theorem of Pythagoras) can be verified in a single experiment if 
the underlying assumptions are fulfilled. But, although statistical regularities can be 
proved in a mathematically exact way just as the theorem of Pythagoras or the rules 
of differentiation and integration of real functions, their experimental verification 
requires a huge number of repetitions of one and the same experiment. Even leading 
scientists spared no expense to do just this. The Comte de Buffon (1707 — 1788) and 
the mathematician Karl Pearson (1857-1936) had flipped a fair coin several 
thousand times and recorded how often 'head' had appeared. The following table 
shows their results (x number of total flippings, m number of outcome ‘head’): 


Buffon 4040 2048 0.5080 


Thus, the more frequently a coin is flipped, the more approaches the ratio m/n the 
value 1/2 (compare with example | above). In view of the large number of flipp- 
ings, this principal observation is surely not a random result, but can be confirmed 
by all those readers who take pleasure in repeating these experiments. However, 
nowadays the experiment 'flipping a coin' many thousand times is done by a comput- 
er with a ‘virtual coin' in a few seconds. The ratio m/n is called the relative frequency 
of the occurrence of the random event ‘head appears.' 

Already the expositions made so far may have convinced many readers that random 
phenomena are not figments of human imagination, but that their existence is object- 
ive reality. There have been attempts to deny the existence of random phenomena by 
arguing that if all factors and circumstances, which influence the occurrence of an 
event are known, then an absolutely sure prediction of its occurrence is possible. In 
other words, the protagonists of this thesis consider the creation of the concept of 
randomness only as a sign of ‘human imperfection.’ The young Pierre Simeon 
Laplace (1729 — 1827) believed that the world is down to the last detail governed by 
deterministic laws. Two of his famous statements concerning this are: 'The curve 
described by a simple molecule of air in any gas is regulated in a manner as certain 
as the planetary orbits. The only difference between them lies in our ignorance.’ And: 
'Give me all the necessary data, and I will tell you the exact position of a ball on a 
billiard table’ (after having been pushed). However, this view has proved futile both 
from the philosophical and the practical point of view. Consider, for instance, a 
biologist who is interested in the movement of animals in the wilderness. How on 
earth is he supposed to be in a position to collect all that information, which would 
allow him to predict the movements of only one animal in a given time interval with 
absolute accuracy? Or imagine the amount of information you need and the 
corresponding software to determine the exact path of a particle, which travels in a 
fluid, when there are 102! collisions with surrounding molecules per second. It is an 
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unrealistic and impossible task to deal with problems like that in a deterministic way. 
The physicist Marian von Smoluchowski (1872 — 1917) wrote in a paper published in 
1918 that ‘all theories are inadequate, which consider randomness as an unknown 
partial cause of an event. The chance of the occurrence of an event can only depend 
on the conditions, which have influence on the event, but not on the degree of our 
knowledge.' 


Already at a very early stage of dealing with random phenomena the need arose to 
quantify the chance, the degree of certainty, or the likelihood for the occurrence of 
random events. This had been done by defining the probability of random events and 
by developing methods for its calculation. For now the following explanation is 
given: The probability of a random event is a number between 0 and 1. The imposs- 
ible event has probability 0, and the certain event has probability 1. The probability 
of a random event is the closer to 1, the more frequently it occurs. Thus, if in a long 
series of experiments a random event A occurs more frequently than a random event 
B, then A has a larger probability than B. In this way, assigning probabilities to 
random events allows comparisons with regard to the frequency of their occurrence 
under identical conditions. There are other approaches to the definition of probabili- 
ty than the classical (frequency) approach, to which this explanation refers. For 
beginners the frequency approach is likely the most comprehensible one. 


Gamblers, in particular dice gamblers, were likely the first people, who were in need 
of methods for comparing the chances of the occurrence of random events, i.e., the 
chances of winning or losing. Already in the medieval poem De Vetula of Richard de 
Fournival (ca 1200-1250) one can find a detailed discussion about the total number 
of possibilities to achieve a certain number, when throwing 3 dice. Geronimo 
Cardano (1501-1576) determined in his book Liber de Ludo Aleae the number of 
possibilities to achieve the total outomes 2, 3, ..,12, when two dice are thrown. For 
instance, there are two possibilities to achieve the outcome 3, namely (1,2) and (2,1), 
whereas 2 will be only then achieved, when (1,1) occurs. (The notation (i, /) means 
that one die shows an i and the other one a j.) Galileo Galilei (1564 — 1642) proved 
by analogous reasoning that, when throwing 3 dice, the probability to get the (total) 
outcome 10 is larger than the probability to get a 9. The gamblers knew this from 
their experience, and they had asked Galilei to find a mathematical proof. The 
Chevalier de Méré formulated three problems related to games of chance and asked 
the French mathematician Blaise Pascal (1623 — 1662) for solutions: 


1) What is more likely, to obtain at least one 6 when throwing a die four times, or in 
a series of 24 throwings of two dice to obtain at least once the outcome (6,6)? 

2) How many time does one have to throw two dice at least so that the probability to 
achieve the outcome (6,6) is larger than 1/2? 

3) In a game of chance, two equivalent gamblers need each a certain number of points 
to become winners. How is the stake to fairly divide between the gamblers, when for 
some reason or other the game has to be prematurely broken off ? (This problem of 
the fair division had been already formulated before de Méré, e.g., in the De Vetula.) 
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Pascal sent these problems to Pierre Fermat (1601-1665) and both found their 
solutions, although by applying different methods. It is generally accepted that this 
work of Pascal and Fermat marked the beginning of the development of probability 
theory as a mathematical discipline. Their work has been continued by famous 
scientists as Christian de Huygens (1629-1695), Jakob Bernoulli (1654-1705), 

Abraham de Moivre (1667 — 1754), Carl Friedrich Gauss (1777 — 1855), and last 
but not least by Simeon Denis de Poisson (1781 — 1840). However, probability theory 
was out of its infancy only in the thirties of the twentieth century, when the Russian 
mathematician Andrej Nikolajewic Kolmogorov (1903 — 1987) found the solution of 
one of the famous Hilbert problems, namely to put probability theory as any other 
mathematical discipline on an axiomatic foundation. 


Nowadays, probability theory together with its applications in science, medicine, 
engineering, economy et al. are integrated in the field of stochastics. The linguistic 
origin of this term can be found in the Greek word stochastikon. (Originally, this term 
denoted the ability of seers to be correct with their forecasts.) Apart from probability 
theory, mathematical statistics is the most important part of stochastics. A key subject 
of it is to infer by probabilistic methods from a sample taken from a set of interesting 
objects, called among else sample space or universe, to parameters or properties of 
the sample space (inferential statistics). Let us assume we have a lot of 10 000 
electronic units. To obtain information on what percentage of these units is faulty, we 
take a sample of 100 units from this lot. In the sample, 4 units are faulty. Of course, 
this figure does not imply that there are exactly 400 faulty units in the lot. But 
inferential statistics will enable us to construct lower and upper bounds for the 
percentage of faulty units in the lot, which limit the 'true percentage’ with a given 
high probability. Problems like this led to the development of an important part of 
mathematical statistics, the statistical quality control. Phenomena, which depend both 
on random and deterministic influences, gave rise to the theory of stochastic 
processes. For instance, meteorological parameters like temperature and air pressure 
are random, but obviously also depend on time and altitude. Fluctuations of share 
prices are governed by chance, but are also driven by periods of economic up and 
down turns. Electromagnetic noise caused by the sun is random, but also depends on 
the periodical variation of the intensity of sunspots. 

Stochastic modeling in operations research comprises disciplines like queueing 
theory, reliability theory, inventory theory, and decision theory. All of them play an 
important role in applications, but also have given many impulses for the theoretical 
enhancement of the field of stochastics. Queueing theory provides the theoretical 
fundament for the quantitative evaluation and optimization of queueing systems, i.e., 
service systems like workshops, supermarkets, computer networks, filling stations, 
car parks, and junctions, but also military defense systems for 'serving' the enemy. 
Inventory theory helps with designing warehouses (storerooms) so that they can on 
the one hand meet the demand for goods with sufficiently high probability, and on 
the other hand keep the costs for storage as small as possible. The key problem with 
dimensioning queueing systems and storage capacities is that flows of customers, 


6 APPLIED PROBABILITY AND STOCHASTIC PROCESSES 


service times, demands, and delivery times of goods after ordering are subject to 
random influences. A main problem of reliability theory is the calculation of the 
reliability (survival probability, availability) of a system from the reliabilities of its 
subsystems or components. Another important subject of reliability theory is model- 
ling the aging behavior of technical systems, which incidentally provides tools for 
the survival analysis of human beings and other living beings. Chess automats got 
their intelligence from the game theory, which arose from the abstraction of games of 
chance. But opponents within this theory can also be competing economic blocs or 
military enemies. Modern communication would be impossible without information 
theory. This theory provides the mathematical foundations for a reliable transmission 
of information although signals may be subject to noise at the transmitter, during 
transmission, and at the receiver. In order to verify stochastic regularities, nowadays 
no scientist needs to manually repeat thousands of experiments. Computers do this 
job much more efficiently. They are in a position to virtually replicate the operation 
of even highly complex systems, which are subjected to random influences, to any 
degree of accuracy. This process is called (Monte Carlo) simulation. More and very 
fruitful applications of stochastic (probabilistic) methods exist in fields like physics 
(kinetic gas theory, thermodynamics, quantum theory), astronomy (stellar statistics), 
biology (genetics, genomics, population dynamic), artificial intelligence (inference 
under undertainty), medicine, genomics, agronomy and forestry (design of experi- 
ments, yield prediction) as well as in economics (time series analysis) and social 
sciences. There is no doubt that probabilistic methods will open more and more 
possibilities for applications, which in turn will lead to a further enhancement of the 
field of stochastics. 


More than 300 hundreds years ago, the famous Swiss mathematician Jakob Bernoulli 
proposed in his book Ars Conjectandi the recognition of stochastics as an independ- 
ent new science, the subject of which he introduced as follows: 


To conjecture about something is to measure its probability: The Art of conjecturing 
or the Stochastic Art is therefore defined as the art of measuring as exactly as possi- 
ble the probability of things so that in our judgement and actions we always can 
choose or follow that which seems to be better, more satisfactory, safer and more 
considered. 


In line with Bernoulli's proposal, an independent science of stochastics would have 
to be characterized by two features: 

1) The subject of stochastics is uncertainty caused by randomness and/or ignorance. 
2) Its methods, concepts, and language are based on mathematics. 


But even now, in the twenty-first century, an independent science of stochastics is 
still far away from being officially established. There is, however, a powerful sup- 
port for such a move by internationally leading academics; see von Collani (2003). 


PART I 
Probability Theory 


There is no credibility in sciences in which 
no mathematical theory can be applied, 
and no credibility in fields which have no 
connections to mathematics. 


Leonardo da Vinci 


CHAPTER 1 


Random Events and Their Probabilities 


1.1 RANDOM EXPERIMENTS 


If water is heated up to 100°C at an air pressure of 101 325 Pa, then it will inevitab- 
ly start boiling. A motionless pendulum, when being pushed, will start swinging. If 
ferric sulfate is mixed with hydrochloric acid, then a chemical reaction starts, which 
releases hydrogen sulfide. These are examples for experiments with deterministic 
outcomes. Under specified conditions they yield an outcome, which had been known 
in advance. 


Somewhat more complicated is the situation with random experiments or experim- 
ents with random outcome. They are characterized by two properties: 

1. Repetitions of the experiment, even if carried out under identical conditions, gen- 
erally have different outcomes. 

2. The possible outcomes of the experiment are known. 

Thus, the outcome of a random experiment cannot be predicted with certainty. This 
implies that the study of random experiments makes sense only if they can be repeat- 


ed sufficiently frequently under identical conditions. Only in this case stochastic or 
statistical regularities can be found. 
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Let QO be the set of possible outcomes of a random experiment. This set is called 
sample space, space of elementary events, or universe. Examples of random experi- 
ments and their respective sample spaces are: 

1) Counting the number of traffic accidents a day in a specified area: Q = {0, 1,...}. 

2) Counting the number of cars in a parking area with maximally 200 parking bays at 
a fixed time point: QO = {0, 1,...,200}. 

3) Counting the number of shooting stars during a fixed time interval: Q = {0, 1,...}. 
4) Recording the daily maximum wind velocity at a fixed location: QO = [0, 0). 

5) Recording the lifetimes technical systems or organisms: Q = [0, 0). 

6) Determining the number of faulty parts in a set of 1000: QO = {0, 1,..., 1000}. 

7) Recording the daily maximum fluctuation of a share price: Q = [0, 0). 

8) The total profit sombody makes with her/his financial investments a year. 

This 'profit' can be negative, i.e. any real number can be the outcome: Q = (—0%, +00). 
9) Predicting the outcome of a wood reserve inventory in a forest stand: Q = [0, «). 
10) a) Number of eggs a sea turtle will bury at the beach: Q = {0, 1,...}. 


b) Will a baby turtle, hatched from such an egg, reach the water? Q = {0,1} with 
meaning 0: no, |: yes. 


As the examples show, in the context of a random experiment, the term 'experiment' 
has a more general meaning than in the customary sense. 


A random experiment may also contain a deterministic component. For instance, the 
measurement of a physical quantity should ideally yield the exact (deterministic) 
parameter value. But in view of random measurement errors and other (subjective) 
influences, this ideal case does not materialize. Depending on the degree of accuracy 
required, different measurements, even if done under identical conditions, may yield 
different values of one and the same parameter (length, temperature, pressure, amper- 


age,...). 


1.22 RANDOM EVENTS 


A possible outcome of a random experiment, i.e. any @ € Q, is called an element- 
ary event or a simple event. 


1) The sample space of the random experiment ‘throwing two dice consists of 36 
simple elements: OQ = {(i,/), i,j = 1,2,---,6}. The gambler wins if the sum i+, is at 
least 10. Hence, the 'winning simple events' are (5,5), (5,6), (6,5), and (6, 6). 


2) In a delivery of 100 parts some may be defective. A subset (sample) of n = 12 parts 
is taken, and the number N of defective parts in the sample is counted. The elemen- 
tary events are 0,1,...,12 (possible numbers of defective parts in the sample). The 
delivery is rejected if N= 4. 
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3) In training, a hunter shoots at a cardboard dummy. Given that he never fails the 
dummy, the latter is the sample space Q, and any possible impact mark at the dum- 
my is an elementary event. Crucial subsets to be hit are e.g. ‘head' or ‘heart.' 

Already these three examples illustrate that often not single elementary events are 
interesting, but sets of elementary events. Hence it is not surprising that concepts and 
results from set theory play a key role in formally establishing probability theory. For 
this reason, next the reader will be reminded of some basic concepts of set theory. 


Basic Concepts and Notation from Set Theory A set is given by its elements. We 
can consider the set of all real numbers, the set of all rational numbers, the set of all 
people attending a performance, the set of buffalos in a national park, and so on. A 
set is called discrete if it is a finite or a countably infinite set. By definition, a count- 
ably infinite set can be written as a sequence. In other words, its elements can be 
numbered. Ifa set is infinite, but not countably infinite, then it is called nondenumer- 
able. Nondenumerable sets are for instance the whole real axis, the positive half-axis, 
a finite subinterval of the real axis, or a geometric object (area of a circle, target). 


Let A and B be two sets. In what follows we assume that all sets A, B, ... considered 
are subsets of a ‘universal set! Q. Hence, for any set A, Ac QQ. 


A is called a subset of B if each element of A is also an element of B. 
Symbol: 4 cB. 


The complement of B with regard to A contains all those elements of B which are not 
element of A. 


Symbol: B\A 

In particular, A = Q\A contains all those elements which are not element of A. 

The intersection of A and B contains all those elements which belong both to A and B. 
Symbol: 4 1B 

The union of A and B contains all those elements which belong to 4 or B (or to both). 
Symbol: 4 UB 


These relations between two sets are illustrated in Figure 1.1 (Venn diagram). The 
whole shaded area is AUB. 


Q B 


A\B oe 


Figure 1.1 Venn diagram 
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For any sequence of sets 4|,A2,---,An, intersection and union are defined as 
n n 
(\A; =A, OA20°:-AAn, A; =A{UA2U>:-UAn. 
el) i=1 

De Morgan Rules for 2 Sets 


AUB=AQNB, AQB=AUB. (1.1) 
De Morgan Rules for n Sets 


W4;=C)4), (4, = Uap (1.2) 


i=l i=l i=l i=l 


Random Events A random event (briefly: event) A is a subset of the set QO of all 
possible outcomes of a random experiment, i.e. A CQ. 


A random event A is said to have occurred as a result of a random experiment 
if the observed outcome © of this experiment is an element of A: ® € A. 


The empty set © is the impossible event since, for not containing any elementary 
event, it can never occur. Likewise, Q is the certain event, since it comprises all pos- 
sible outcomes of the random experiment. Thus, there is nothing random about the 
events @ and Q. They are actually deterministic events. Even before having complet- 
ed a random experiment, we are absolutely sure that Q will occur and @ will not. 


Let A and B be two events. Then the set-theoretic operations introduced above can be 
interpreted in terms of the occurrence of random events as follows: 

Ac B is the event that both A and B occur, 

AUB is the event that A or B (or both) occur, 

If A cB (Aisa subset of B), then the occurrence of A implies the occurrence of B. 


A\B is the set of all those elementary events which are elements of A, but not of B. 
Thus, A\B is the event that A occurs, but not B. Note that (see Figure 1.1) 


A\B=A\(ACB). (1.3) 


The event A = Q\4 is called the complement of A. It consists of all those elementary 
events, which are not in A. 


Two events A and B are called disjoint or (mutually) exclusive if their joint occur- 
rence is impossible, i.e. if 4 1 B = ©. In this case the occurrence of A implies that B 
cannot occur and vice versa. In particular, A and A are disjoint for any event A cQ. 


Short Terminology 

ANB AandB 

AUB AorB 

ACB Aimplies B, B follows from A 
A\B A but not B 


A A not 
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Example 1.1 Let us consider the random experiment 'throwing a die’ with sample 
space Q= {1,2,---,6} and the random events 4 = {2,3} and B= {3,4,6}. Then, 
AQB= {3} and AUB = {2,3,4,6}. Thus, if a 3 had been thrown, then both the 
events A and B have occurred. Hence, A and B are not disjoint. Moreover, A\B = {2}, 
B\A = {4,6}, and A= {1,4,5, 6}. oO 


Example 1.2 Two dice D; and D> are thrown. The sample space is 


Q = {(1,/2), i1,72 = 1,2,--:, 6}. 
Thus, an elementary event @ consists of two integers indicating the results 7; and i> 
of D; and Do, respectively. Let A = {i; +i2 <3} and B= {i,/i2 =2}. Then, 
A={(1,)), (1,2), (2,D}, B={@, 1), 4,2), (6, 3)}. 
Hence, 
ANB={(2,1}}, AUB= {(1,1), (1,2), (2, 1), (4,2), (6, 3)} 


and A\B = {(1,1), (1,2)}. Oo 


Example 1.3 A company is provided with power by three generators G;, G2, and 
G3.The company has sufficient power to maintain its production if only two out of 
the three generators are operating. Let A; be the event that generator G;, i= 1, 2,3, is 
operating, and B be the event that at least two generators are operating. Then, 


B =A A,A3U0A1 A943 VA A9A3U A] ADA}. Oo 


13 PROBABILITY 


The aim of this section consists in constructing rules for determining the probabilities 
of random events. Such a rule is principally given by a function P on the set E of all 
random events A: P= P(A), A €E. 


Note that in this context A is an element of the set E so that the notation A CE would not be 
correct. Moreover, not all subsets of Q need to be random events, i.e., the set E need not 
necessarily be the set of all possible subsets of Q. 


The function P assigns to every event A a number P(A), which is its probability. Of 
course, the construction of such a function cannot be done arbitrarily. It has to be 
done in such a way that some obvious properties are fulfilled. For instance, if A im- 
plies the occurrence of the event B, i.e. Ac B, the B occurs more frequently than A 
so that the relation P(A) < P(B) should be valid. If in addition the function P has 
properties P(@)=0 and P(Q)=1, then the probabilities of random events yield 
indeed the desired information about their degree of uncertainty: The closer P(A) is 
to 0, the more unlikely is the occurrence of A, and the closer P(A) is to 1, the more 
likely becomes the occurrence of A. 
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To formalize this intuitive approach, let for now P= P(A) be a function on E with 
properties 

Th P@)=0, PO)=1, ID IfA CB, then P(A) < PB). 

As a corollary from these two properties we get the following property of P : 

II) For any event A, 0 < P(A) <1. 


1.3.1 Classical Definition of Probability 


The classical concept of probability is based on the following two assumptions: 
1) The space Q of the elementary events is finite. 


2) As a result of the underlying random experiment, each elementary event has the 
same probability to occur. 


A random experiment with properties | and 2 is called a Laplace random experiment. 
Let n be the total number of elementary events (i.e. the cardinality of Q). Then any 
random event A <Q consisting of m elementary events has probability 
P(A) =m/n. (1.4) 
Let Q = {a}, a2,-::,an}. Then every elementary event has probability 
P(a;)=1/n, i=1,2,...,n. 


Obviously, this definition of probability satisfies the properties I, II, and III listed 
above. The integer m is said to be the number of favorable cases (for the occurrence 
of A), and v is the number of possible cases. 

The classical definition of probability arose in the Middle Ages to be able to determine 
the chances to win in various games of chance. Then formula (1.4) is applicable given 
that the players are honest and do not use marked cards or manipulated dice. For 
instance, what is the probability of the event A that throwing a die yields an even 
number? In this case, A = {2,4,6} so that m = 3 and P(A) =3/6=0.5. 


Example 1.4 When throwing 3 dice, what is more likely, to achieve the total sum 9 
(event Ag) or the total sum 10 (event 419)? The corresponding sample space is 
Q= {i,j,k 1<ij,k< 6} withn = 6? =216 
possible outcomes. The integers 9 and 10 can be represented a as sum of 3 positive 
integers in the following ways: 
9=3434+3=44342=44+441=54242=54+34+1=64+2+1, 

10=44+343=44442=54342=54+44+1=642+2=64+3+1. 

The sum 3+3+3 corresponds to the event 4333 = ‘every die shows a 3' = {(3,3,3)}. 


The sum 4+3+2 corresponds to the event 4432 that one die shows a 4, another die a 
3, and the remaining one a 2: 
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A432 = {(2,3, 4), (2, 4, 3), (3, 2, 4), (3, 4, 2), (4, 2, 3), (4,3, 2)}. 
Analogously, 

Aga = {(1,4,4), (4, 1,4), (4,4, 1)}, 4522 = {(@2, 2,5), (2,5, 2), (5, 5,2), 

A531 = {(1,3,5), C1, 5, 3), (3, 1, 5), (3,5, 1), (65, 1,3), (5,3, D}, 

Aoz1 = {(1, 2, 6), (1, 6, 2), (2, 1, 6), (2, 6, 1), (6, 1,2), (6,2, 1)}. 
Corresponding to the given sum representations for 9 and 10, the numbers of favor- 
able elementary events belonging to the events Ag and A 19, respectively, are 

myg=1+64+34+34+6+6=25, mgp=2+34+6+64+3+6=27. 
Hence, the desired probabilities are: 
P(Ag) = 25/216 =0.116, P(Aj9) = 27/216 =0.125. 


The dice gamblers of the Middle Ages could not mathematically prove this result, 
but from their experience they knew that P(A9) < P(A 49). O 


Example 1.5 d dice are thrown at the same time. 

What is the smallest number d= d* with property that the probability of the event 
A = 'no die shows a 6' does not exceed 0.1? 

The problem makes sense, since with increasing d the probability P(A) tends to 0, 
and if d=1, then P(A)=5/6. For d=>1, the corresponding space of elementary 
events Q has n= 6% elements, namely the vectors (i},i2,---,ig), where the i, are 
integers between | and 6. Amongst the 6% elementary events those are favorable for 
the occurrence of A, where the i, only assume integers between | and 5. Hence, for 
the occurrence of A exactly 5¢ elementary events are favorable: 


P(A) = 5464. 
The inequality 5“/64 < 0.1 is equivalent to 


2.3026 
0.1823 


Hence, d* = 13. Oo 


d(In5/6) < In(0.1) or d(—0.1823) <-2.3026 or d= = 12.63. 


Binomial Coefficient and Faculty For solving the next examples, we need a result 
from elementary combinatorics: The number of possibilities to select subsets of k 
different elements from a set of n different elements, |<k<n, is given by the 
binomial coefficient (ar which is defined as 


n\)_ n(n—1)---(n-k+1) 
2 > , = eee (1.5) 
where &! is the faculty of k: k! =1-2---k. By agreement 


(")=1 and O!=1. 
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The faculty of a positive integer has its own significance in combinatorics: 


| There are k! different possibilities to order a set of k different objects. 


Example 1.6 An optimist buys one ticket in a '6 out of 49' lottery and hopes for hit- 

ting the jackpot. What are his chances? There are 
Ga _ 49-48-47 - 46-45 - 44 

6 6! 

different possibilities to select 6 numbers out of 49. Thus, one has to fill in almost 14 

million tickets to make absolutely sure that the winning one is amongst them. It is 

m=1 and n= 13983816. Hence, the probability p¢ of having picked the six 'cor- 

rect' numbers is 


= 13983 816 


1 


P6= TaORT RIE = 0-0000000715. oO 


The classical definition of probability satisfies the properties P(®) = 0 and P(Q) = 1, 
since the impossible event @ does not contain any elementary events (m= 0) and 

the certain event QO comprises all elementary events (m =n). 

Now, let A and B be two events containing m4 and mg elementary events, respectiv- 
ely. If Ac B, then m4 < meg so that P(A) < P(B). If the events A and B are disjoint, 

then they have no elementary events in common so that AUB contains m4+mg 
elementary events. Hence 


P(AUB) = “4 2MB _ MA, MB _ pi) + P(B) 
or P(AUB) =P(A)+P(B) if AAB=@. (1.6) 


More generally, if 4,,42,---,Ar are pairwise disjoint events, then 


P(A; UAQU-+-UA,) = P(A1) + P(A2) +:+-+ P(Ar), A; 0A,=O, i#k. (1.7) 


Example 1.7 When participating in the lottery '6 out of 49' with one ticket, what is 
the probability of the event A to have at least 4 correct numbers? 


Let A; be the event of having got i numbers correct. Then, 
A=AjsUAS5UA¢6. 
A4,A5, and Ag are pairwise disjoint events. (It is impossible that there are on one 
and the same ticket, say, exactly 4 and exactly 5 correct numbers.) Hence, 
P(A) = P(A4) + P(A5) + P(Ao). 
There are (8) =15 possibilities to choose 4 numbers from the 6 ‘correct’ ones. To 


each of these 15 choices there are 
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possibilities to pick 2 numbers from the 43 'wrong' numbers. Therefore, favorable for 
the occurrence of Aq are mq = 15-903 = 13545 elementary events. Hence, 


Ppa = P(A4) = 13 545/13 983 616 = 0.0009686336. 


OGint 
crea eae 


6 
Together with the result of example 1.6, P(A) =p4+ps5+p6 = 0.0009871552, Le., 
almost 10 000 tickets have to be bought to achieve the desired result. Oo 


Analogously, 


= 0.0000184499. 


1.3.2 Geometric Definition of Probability 


The geometric definition of probability is subject to random experiments, in which 
every outcome has the same chance to occur (as with Laplace experiments), but the 
sample space Q is a bounded subset of the one, two or three dimensional Euklidian 
space (real line, plain, space). Hence, in each case Q is a nondenumerable set. In 
most applications, Q is a finite interval, a rectangular, a circle, a cube or a sphere. 


Let 4 CQ be a random event. Then we denote by p(A) the measure of A. For 
instance, if Q is a finite interval, then u(Q) is the length of this interval. If A is the 
union of disjoint subintervals of Q, then (A) is the total length of these subinter- 
vals. (We do not consider subsets like the set of all irrational numbers in a finite 
interval.) If QO is a rectangular and A is a circle embedded in this rectangular, then 
u(A) is the area of this circle and so on. If 1 is defined in this way, then 


ACBcQ implies (A) < p(B) < W(Q). 


Under the assumptions stated, a probability is assigned to every event A <Q by 


w(A) 

P(A) (Q) (1.8) 
For disjoint events A and B, (A UB) = p(A) + (8) so that formulas (1.6) and (1.7) 
are true again. Analogously to the classical probability, (A) can be interpreted as 
the measure of all elementary events, which are favorable to the occurrence of A. 
With the given interpretation of the measure (-), every elementary event, i.e. every 
point in QO, has measure and probability 0 (different to the Laplace random experi- 
ment). (A point, whether at a line, in a plane or space has always extension 0 in all 
directions.) But the assumption "every elementary event has the same chance to 
occur" is not equivalent to the fact that every elementary event has probability 0. 
Rather, this assumption has to be understood in the following sense: 


| 4! those random events, which have the same measure, have the same probability. 
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Thus, never mind where the events (subsets of QO) with the same measure are located 
in Q and however small their measure is, the outcome of the random experiment will 
be in any of these events with the same probability, i.e., no area in Q is preferred with 
regard to the occurrence of elementary events. 


Example 1.8 For the sake of a tensile test, a wire is clamped at its ends so that the 
free wire has a length of 400 cm. The wire is supposed to be homogeneous with 
regard to its physical parameters. Under these assumptions, the probability p that the 
wire will tear up between 0 and 40 cm or 360 and 400 cm is 


40 +40 
= ——— = 0.2. 
400 
Repeated tensile tests will confirm or reject the assumption that the wire is indeed 
homogeneous. O 
= 
YN 
KR Ny 
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Figure 1.2 Illustration to example 1.9 


Example 1.9 Two numbers x and y are randomly picked from the interval [0, 1]. 
What is the probability that x and y satisfy both the conditions 


x+y2>1 and x*4+y? <1? 


Note: In this context, 'randomly' means that every number between 0 and 1 has the same 
chance of being picked. 


In this case the sample space is the unit square Q=[0<x<1,0<y< 1], since an 
equivalent formulation of the problem is to pick at random a point out of the unit 
square, which is favorable for the occurrence of the event 


A= {(x,y); x+y2 1x? +y? <1}. 
Figure 1.2 shows the area (hatched) given by A, whereas the 'possible area' Q. is left 


white, but also includes the hatched area. Since 1(Q) = 1 and w(A) = 2/4—0.5 (area 
of a quarter of a circle with radius 1 minus the area of the half of a unit square), 


P(A) = (A) = 0.2854. Oo 


Example 1.10 (Buffon's needle problem) At an even surface, parallel straight lines 
are drawn at a distance of a cm. At this surface a needle of length Z is thrown, L < a. 
What is the probability of the event A that the needle and a parallel intersect? 
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y=Lsina 


y=Lsina 


kK—s —>| 


a) 


Figure 1.3 Illustration to example 1.10 


The position of the needle at the surface is fully determined by its distance of its 'low- 
er' endpoint to the 'upper' parallel and by its angle of inclination a to the parallels 
(Figure 1.3a), since a shift of the needle parallel to the lines obviously has no influ- 
ence on the desired probability. Thus, the sample space is given by the rectangle 
Q= {0,0), 0<y<a,0<a<t} 
with area u(Q) = az (Figure 1.3b). Hence, Buffon's needle problem formally consists 
in randomly picking elementary events given by (y,a) from the rectangle Q. Since 
the needle and the upper parallel intersect if and only if y<Zsina, the favorable 
area for the occurrence of A is given by the hatched part in Figure 1.3b. The area of 
this part is 
u(A) = Jf Lsinada = L[-cosa]§ = L[1+1]=2L. 


Hence, the desired probability is P(A) = 2 L/an. O 


1.3.3 Axiomatic Definition of Probability 


The classical and the geometric concepts of probability are only applicable to very 
restricted classes of random experiments. But these concepts have illustrated which 
general properties a universally applicable probability definition should have: 


Definition 1.1 A function P = P(A) on the set of all random events E with © € E and 
Q € Eis called probability if it has the following properties: 


Tl) PQ)=1. 
Il) Forany4 € E, 0< P(A) <1. 
II) For any sequence of disjoint events 41, Ao,..., Le., A; VA; = © for ij, 


Gai) => P(A;). (1.9) 
e 


Property III makes sense only if with A; ¢ E the union U2, 4; is also an element of 
E. Hence we assume that the set of all random events E is a o—algebra: 
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Definition 1.2 Any set of random events E is called a o — algebra if 
NQEE. 
2) If A € E, then A € E In particular, Q=@ €E. 
3) For any sequence 41,A),... with A; € E, the union U2, 4; is also a random 
event, i.e., 
Ui 4; € E. 


[Q, E] is called a measurable space, and [Q, E, P] is called a probability space. ©®@ 


Note: In case of a finite or a countably infinite set OQ, the set E is usually the power set of Q, 
i.e. the set of all subsets of OQ. A power set is, of course, always a o—algebra. In this book, 
taking into account its applied orientation, specifying explicitly the underlying o— algebra is 
usually not necessary. [Q, E] is called a measurable space, since to any random event A eE a 
measure, namely its probability, can be assigned. In view of the de Morgan rules (1.1): If.A and 
B are elements of E, then A OB as well. 


Given that E is a o—algebra, properties I-III of definition 1.1 imply all the properties 
of the probability functions, which we found useful in sections 1.3.1 and 1.3.2: 


a) Let A; = @ for i=n+1,n+2,---. Then, from IID), 


Pn 4) =D P(A), 4,04; =D, 14), f= 12,0050. (1.10) 
In particular, letting nm =2 and A=A,, B=Az, this formula implies 
P(AUB) = P(A) + PB) if ANB=@. (1.11) 
With B = 4, taking into account Q = AUA and P(Q) = 1, formula (1.11) yields 
P(A)+ P(A) =1 or P(A) =1-P(A). (1.12) 
Applying (1.12) with A =Q yields P(Q) + P(@) = 1, so that 
P(@)=0, PQ)=1. (1.13) 


Note that P(Q) = 1 is part of definition 1.1. 


b) If A and B are two events with A < B, then B can be represented as B = A U(B\A). 
Since A and B\4 are disjoint, by (1.11), P(B) = P(A) + P(B\A) or, equivalently, 


P(B\A) = P(B)- P(A) if ACB. (1.14) 

Therefore, P(A) < P(B) if ACB. (1.15) 

c) For any events A and B, the event AUB can be represented as follows (Figure 1.1) 
AUB = {AAN B)ULB\(AN B)SU(AN B). 


In this representation, the three events combined by 'U' are disjoint. Hence, by (1.10) 
with n =3: 
PAUB) = P({A\4 1 B)}) + PU B\(ANB)S) 4+ PAC B). 
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On the other hand, since (A 1B) CA and (AB) CB, from (1.14), 
P(AUB) = P(A)+ P(B)- P(A B). (1.16) 
Given any 3 events A, B, and C, the probability of the event AUBUC can be deter- 
mined by replacing in (1.16) A with AUB and B with C. This yields 
P(AUBUC) = P(A) + P(B) + P(C)- P(A B)-P(ANC)-P(BOC) 
+P(ANBOAC) (1.17) 


d) For any 1 events A1, A2,...,An one obtains by repeated application of (1.16) 
(more exactly, by induction) the Jnclusion-Exclusion Formula or the Formula of 
Poincaré for the probability of the event 4;U A2U---UAn: 


n 
P(A,{UA9U-++UAn) = X (-1)* 1 R, (1.18) 
k=1 


n 
with Ry= x PA; CAgA OA), 


(i, <ig<:--<ix) 
where the summation runs over all k-dimensional vectors (71, i, ...,/,) out of the set 
{1,2,...,.2} with 1 <ij <ig<-+-<i,sn and k=1,2,...,n. The sum representing R; 
has exactly 7) terms, so that the total number of terms in (1.18) is 
3 () =2" 
=2"-1, 
k= \* 
For instance, ifn =3, then the R; in (1.18) are 
Ri = P(A}) + P(A2) + P(A), 
Ro = P(A, OA) + P(A, OA3) + P(Ay OA3), 
R3 = P(A, MA) OA3). 


2 
e4 
4 
1 23 bs 
3 


Figure 1.4 Computer network with 4 computers 


Example 1.11 Figure 1.4 shows a simple local computer network. Computers are 
located at nodes 1, 2, 3, and 4. The transmission of data between the computers is 
possible via cables e1, e2,---,e5, which link the four computers. Cable e; is avail- 
able, i.e. in a position to transfer information, with probability p; and unavailable 
(e.g. under maintenance, waiting for maintenance, waiting for replacement for hav- 
ing been stolen) with probability g; = 1—p;, i= 1,2,...,5. 
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What is the probability of the event A that the computer at node | can transfer data to 
the computer at node 4 via one or more paths (chains) of available edges which con- 
nect node | to node 4? There are four potential candidates for such paths: 


w1 = {e1,€4}, W2 = {€2,€5}, W3 = {€1,€3,€5}, W4 = {€2,€3,€4}. 
Let A; be the event that all edges in path w; are available, i= 1,2,3,4. Then event 4 
occurs when at least one of these four events occurs. Hence, A can be represented as 
A=A,UA2UA30 Aq. 
The A; are not disjoint. Hence we apply the inclusion-exclusion formula (1.11) for 
representing A: 
P(A) = P(A;\UA2UA3U Aq) = R1 —R2+R3-Ry 
with 
Ry = P(A1) + P(A2) + P(A3) + P(Aa), 
Ro = P(A, NA2)+ P(A NA3) + P(A] Aq) 4+ P(A2 NA43)4+ P(A2 AY) 
+P(Ay Aq) + P(A3 04,4), 
R3 = P(A, N42 043)+ P(A, N42 044) + P(A] NA3.0A4) + P(A2 NA3 Aa), 
Rqg= P(A, ONA2NA3 Ag). 
The event 4; ~ A> means that both the cables in A, and in A> are operating. Thus, 
to the event 4; A> there belongs the set of cables {e},e2,e4,e5}. Hence, the 
notation P(A, 1A2)=pj245 will be used. To the event 4; 7 A2 7 A3 there belongs 
the set of cables {e1,e2,€3,e4,e5}: P(A, NA2 NA3) =P 2345. If this way of nota- 
tion is applied to all other probabilities occurring in the R;, then 
Ri =P14+P25 + P135 + P234, 
Ro = P1245 + P1345 +P 1234 + P1235 + P2345 +P 12345; 
R3 =p 12345 +P 12345 + P12345 +P 12345, R4 =P 12345. 
The desired probability is 
P(A) = p14 +P25 +P 135 +P 234 — P1245 — P1345 — P1234 — P1235 — P2345 + 3p 12345. 


In section 1.4.2, an additional assumption on the operation modus of the cables will 
be imposed which enables the calculation of P(A) only on the basis of the p;. oO 


1.3.4 Relative Frequency 


The probabilities of random events are usually unknown. However, they can be 
estimated by their relative frequencies. If in a series of n repetitions of one and the 
same random experiment the event 4 has been observed exactly m= m(A) times, 
then the relative frequency of A is given by 
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m(A) 


Pr(A) = a (1.19) 
Generally, the relative frequency of A tends to P(A) as n increases: 
lim Pn(A) = P(A). (1.20) 


Thus, the probability of A can be estimated with any required level of accuracy from 
its relative frequency by sufficiently frequently repeating the random experiment (for 
the theoretical background see section 5.2.2). Empirical verifications of the limit rela- 
tion (1.20) were aleady given in the introduction by the coin experiments of Buffon 
and Pearson. Without the validity of (1.20) the gamblers in the Middle Ages would 
not have been in a position to empirically verify that, when throwing three dice, the 
chance to obtain sum 9 is lower than the chance to obtain sum 10 (example 1.4). 


It is interesting that the relationship (1.20) in connection with Buffon's needle prob- 
lem (example 1.10) allows to estimate the number a with any desired degree of 
accuracy. To do this, in the formula P(A) = 2L/na the probability P(A) is replaced 
with the relative frequency py(A) for the occurrence of A in a series of n needle 
throwings. This gives for 7 the estimate 

2L 
apn(A)’ 
Lazzarini (1901) threw the needle n = 3408 times and got for 7 the estimate 

T3408 = 3.141529, 


i.e., the first six figures are the exact ones. The approximate calculation of m was one 
of the first examples how to solve deterministic problems by probabilistic methods. 
Nowadays, nobody needs to throw a needle manually several tousand times. Com- 
puters 'simulate' random experiments of this simple structure many thousand times in 
a twinkling of an eye. 


Tn = 


The reader may object that the approximate calculation of probabilities of all random 
events by their relative frequency is practically not possible, in particular, if the sam- 
ple space is not finite. However, depending on the respective random experiment, the 
probabilities of all its elementary events are frequently given by a unifying mathemat- 
ical pattern (model). For instance, the probability that the random number of traffic 
accidents occurring in a specific area during a year is equal to & can frequently be 
determined by the formula 


k 
P= Le, k= 0.1), 2, 


where A is the average number of traffic accidents which occur a year in that area. 
Hence, for determining all infinitely many probabilities po, p1,..., only the paramet- 
er A has to be estimated. This is done by counting the number x; of traffic accidents 
occurring in year i over a period of n years and determining the arithmetic mean 

x = i ye 1 Xj. 
Defining and discussing mathematical models for the calculation of the probabilities 
of random events is the subject of chapter 2. 
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1.4 CONDITIONAL PROBABILITY AND INDEPENDENCE 
OF RANDOM EVENTS 


1.4.1 Conditional Probability 


Two random events A and B can depend on each other in the following sense: The 
occurrence of B will change the probability of the occurrence of A and vice versa. 
Hence, the additional piece of information 'B has occurred' should be used in order to 
predict the probability of the occurrence of A more precisely. If one has to determine 
the probability that a device does not fail during its guarantee period (event A), then 
this probability may depend on the manufacturer of the device (event B) if there are 
several of them who produce the same type. The probability of having a sunny day 
on 21 August (event A) will increase if there is a sunny day on 20 August (event B) 
in view of the inertia of weather patterns. The probability of attracting a certain dis- 
ease (event A) will usually be larger than average if there was/is a family member, 
who had suffered from this disease (event B). If A is the random event to spot a 
leopard in a certain area of a National Park during a safari, then the probability of A 
increases if it is known that there are baboons in this area (event B). 


Let us now consider some numerical examples to illustrate how to define the probab- 
ility of the occurrence of an event A given that another event B has occurred. 


Example 1.12 A gambler throws the dice | and 2 simultaneously. What is the prob- 
ability that die 1 shows a 6 (event A) on condition that both dice showed an even 
number (event B). This probability will be denoted as P(A|B). The sample space is 
Q= (Gs) i,7 = 1,2,..., 6}. 
In terms of the elementary events (i,/), the events A and B are given by 
A= {(6, 1), (6,2), (6, 3), (6,4), (6, 5), (6, 6)}, 
B= {(2,2), (2,4), (2,6), (4,2), (4,4), (4, 6), (6,2), (6,4), (6, 6)}. 
Hence, 
P(A) = 6/36 and P(B) = 9/36. 
On condition 'B has occurred' the sample space Q reduces to the 9 elementary events 
given by B. From these 9, only the 3 elementary events in the conjunction 
ANB= {(6,2), (6,4), (6,6)} 
are favorable for the occurrence of A: Therefore, 
P(A|B) = 3/9. 
The following representation shows the general structure of P(A|B) : 


3/36 — PAB) 


a es 9/36 = P(B) 
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Example 1.13 In a bowl there are two white and two red marbles. The numbers | 
and 2 are assigned to the white marbles and the numbers 3 and 4 are assigned to the 
red marbles. Two marbles are one after the other randomly picked from the bowl. 
Find the probability of the event A that one of the drawn marbles is white and the 
other red given the event B that the first drawn marble is white. 


The sample space consists of 4-3 = 12 elementary events: 
Q= {G/); i4/, 47 =1,2,3, 4}. 
The events A and B are given by the following sets of elementary events: 
A={(1,3), (1,4), (2,3), 2,4), 3,1), 3,2), (4, 1), (4,2) 5, 
B={(,2),(1,3), 0,4), (2, 1), (2,3), (2,4) }. 
Hence, 
P(A) = 8/12 =2/3 and P(B) = 6/12 = 1/2. 
Since it is known that event B has happened, the space of possible elementary events 
is given by B. Hence, the elementary events which are favorable for the occurrence 
of event A are given by the conjunction 
ANB=<(1,3), (1,4), (2,3), (2,4)}. 
This yields 
4_2_ 4/12 _P(AOB) 
P(A|B) = = = f= St = Oo. 
a1) 6 3 6/12 P(B) 
For the sake of arriving at the general structure of P(A|B), solution of the problem 
had been unnecessarily complicated. The problem is namely quickly solved as 
follows: If the first drawn marble is white (event B), then there are one white and two 


red marbles left in the bowl. Event A occurs if one of the red marbles will be drawn, 
i.e., P(A|B) = 2/3. O 


Example 1.14 The lifetimes of = 1000 electronic units had been tested. 205 units 
failed in the interval [0, 500), 180 units failed in the interval [500, 6001), and the 
remaining 615 units failed after 600. Let A be the event that a unit fails in the inter- 
val [500, 600/), and B be the event that a unit fails after a lifetime of at least 500 h. 
By formula (1.19) with n= 1000, the relative frequencies for the occurrence of 
events A and B are 


a m(A 180 A m(B) _ 1000-205 
PrlA) = C ) = 1000’ Pr(B) = ¢ ) as 1000 = 0.795. 


What is the relative frequency pn(A|B) of the event A on condition that event B has 
occurred? 
Under this condition, only the 795 units, which have survived the first 500, need to 
be taken into account. From these 795 units, 180 fail in [500, 6007). Therefore, 

180 
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Since 4 CB, ie. the occurrence of A implies the occurrence of B, event A satisfies 
A=AWB. Hence, the ‘conditional relative frequency’ p»(A|B) can be written as 
ANB 
mAAB)_ “i 
m(B) ——m(B) 
n 


PrlA|B) = (1.21) 


By (1.20), the relative frequencies a and me) tend to P(A B) and P(B) as 


n—>o, respectively. Thus, the conditional probability of A given B has again the 
structure we know from the previous examples: 
P(AQB) 


Jim, Bn(ALB) = PUAIB)= “Soe 


Oo 


Now it is no longer surprising that the probability of ‘A on condition B’ or, equival- 
ently, the probability of '4 given B’is defined as follows. 


Definition 1.3 Let A and B be two events with P(B) > 0. Then the probability of A on 
condition B is given by 
P(AMB) 


P(A|B) = PB) 


(1.22) 


e 
Note: P(A|B) is also denoted as the probability of A given B, the conditional probability of A 


on condition B, or the conditional probability of A given B. Of course, in (1.22) the roles of A 
and B can be changed. 


If A and B are arbitrary random events, formula (1.22) implies a product formula for 
the probability P(4 4 B) of the joint occurrence of arbitrary events A and B: 


P(A B)=P(A|B) P(B) or P(A B)= P(BIA) P(A). (1.23) 


Example 1.15 In a bowl there are three white and two red marbles. Two marbles are 
randomly taken out one after the other. What is the probability that both of these mar- 
bles are red? 


Let be A and B be the events that the first and the second, respectively, of the chosen 
marbles are red. Hence, the probability P(A 7 B) has to be determined. The probabil- 
ity of A is equal to P(A) = 2/5. On condition A, there are 3 white and 1 red marble in 
the bowl. Hence, P(B|A) = 1/4 so that 


P(A B) = P(B|A) P(A) = 4-2 =0.1. Oo 


Example 1.16 In a study, data from a sample of 12 000 persons had been collected. 
4800 persons in this sample were obese and 3600 suffered from diabetes 2. From the 
diabetes sufferers, 2700 were obese. A person is randomly selected from the sample 
of 12 000 persons. It happens to be Max. Let A be the event that Max is obese, and B 
be the event that Max has diabetes 2. Then 
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P(A) =0.4, P(B) = 0.3, and P(A|B) = 2700/3600 = 0.75. 
Hence, the probability that Max is both obese and a diabetes 2 sufferer is, by (1.22), 
P(A B) = P(A|B)P(B) = 0.75 - 0.3 = 0.225. 
2) To see whether being obese increases the probability of contracting diabetes 2, the 
probability P(B|A) has to be determined: From the right equation of (1.23), 
P(A OB) = 0.225 = P(B|A)P(A) = P(B|A) - 0.4. 

Hence, P(B|A) = 0.5625. Thus, based on this study, being obese increases the 
probability of contracting diabetes 2. O 


1.4.2 Total Probability Rule and Bayes' Theorem 


Frequently several mutually exclusive conditions have influence on the occurrence 
of a random event A. The whole of these conditions are known, but it is not known, 
which of these conditions is taking effect. However, the probabilities are known 
which of these conditions affects the occurrence of A at the time point of interest. 
Under these assumptions, a formula for the occurrence of A will be derived. But next 
the procedure is illustrated by an example. 


Example 1.17 A machine is subject to two stress levels 1 (event B,) and 2 (event 
By) with respective probabilities 0.8 and 0.2. Stress levels can be determined e.g. by 
different production conditions as speed, pressu,re or humidity. It is supposed that 
the stress level does not change during a fixed working period (hour, day). Given 
stress level 1 or 2, the machine will fail during a working period with probability 0.3 
or 0.6, respectively. Hence, 


P(A|B}) = 0.3, P(A|B>) = 0.6. 


Since the events B; and B> are disjoint (mutually exclusive) and Q= B,UB> is the 
certain event, A can be represented as 


A=ANQ=AN(BiUB2)=(ANB)UANB?). 
The events A 1 B, and AM B» are disjoint so that by formula (1.11) 
P(A) = P(ANB,)+P(ANB)). 
By applying (1.23) to each of the two terms on the right-hand side of this formula, 
P(A) = P(AIB1)P(B1) + P(A|B2)P(B2) 
=0.3-0.8+0.6-0.2 = 0.36. 
Thus, without information on the respective stress level, the failure probability of the 


machine in the working period is 0.36. O 


Now the principle, illustrated by this example, is formulated more generally: 
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Definition 1.4 The set of random events {B,,Bo,...,Bn, n <0} 1s an exhaustive set 
of random events for Q if 


2 = Vin Bi, 
and it is a mutually disjoint set of events if 
B, AB; =O, i#j, Lj =1,2,..,n. 


A mutually disjoint and exhaustive (for Q) set of events is called a partition of QQ. @ 


Figure 1.5 Partition of a sample space 


Let {B|, Bo,...,Bn} be an exhaustive and mutually disjoint set of events with pro- 
perty P(B;)>0 for all i=1,2,...,2, and let A be an event with P(A) > 0. Then 4 can 
be represented as follows (see Figure 1.5): 


P(A) = UB). 
i=l 


Since the B; are disjoint, the conjunctions 48; are disjoint as well. Formula 
(1.10) is applicable and yields P(A) = Dj; P(A B;). Now formula (1.23) applied 
to all n probabilities P(A 7 B;) yields 

P(A) = Zi P(AIB;) P(B)). (1.24) 
This result is called the Formula of total probability or the Total probability rule. 
Moreover, formulas (1.22) and (1.23) yield 
P(B;NA) PAB) _ PCAIB,) PCB) 


PRIN =a) =P) OA) 
If P(A) is replaced with its representation (1.24), then 
P(A|B,)) PB; 
P(B;|A) = AB) PB) = eer? (1.25) 


Dir P(A|B,) P(B;) 


Formula (1.25) is called Bayes’ theorem or Formula of Bayes. For obvious reasons, 
the probabilities P(B;) are called a priori probabilities and the conditional probabili- 
ties P(B;| A) a posteriori probabilities. 
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Example 1.18 The manufacturers M@,,M>, and M3 delivered to a supermarket a total 
of 1000 fluorescent tubes of the same type with shares 200, 300, and 500, respective- 
ly. In these shares, there are in this order 12, 9, and 5 defective tubes. 


1) What is the probability that a randomly chosen tube is not defective? 

2) What is the probability that a defective tube had been produced by M;, i= 1, 2,3? 
Let events A and B; be introduced as follows: 

A ='A tube, randomly chosen from the whole delivery, is not defective.' 

B,; ='A tube, randomly chosen from the whole delivery, is from M;,i=1,2,3.' 


According to the figures given: 
P(B,) =0.2, P(B2) =0.3, P(B3) =0.5, 
P(A|B1) = 12/200 = 0.06, P(A|B2) = 9/300 = 0.03, P(A|B3) = 5/500 = 0.01. 


{B,,B>,B3} is a set of exhaustive and mutually disjoint events, since there are no 
other manufacturers delivering tubes of this brand to that supermarket and no two 
manufacturers can have produced one and the same tube. 


1) Formula (1.23) yields 
P(A) = 0.06 - 0.2 + 0.03 -0.3 + 0.01 - 0.5 = 0.026. 


2) Bayes' theorem (1.25) gives the desired probabilities: 
P(A|B1) P(B1) _ 0.06 - 0.2 


P(B,|A)= PA) aoe = 0.4615, 
_ P(A|B2) P(B2) _ 0.03 -0.3 _ 
P(B2|A) = PA) S aRQnee = 0.3462, 
_ P(A|B3) P(B3) _ 0.01 -0.5 | 
P(B3|A) = PA) aa pDE 0.1923. 


Thus, despite having by far the largest proportion of tubes in the delivery, the high 
quality of tubes from manufacturer M3 guarantees that a defective tube is most likely 
not produced by this manufacturer. O 


Example 1.19 1% of the population in a country are HIV-positive. A test procedure 
for diagnosing whether a person is HIV-positive indicates with probability 0.98 that 
the person is HIV-positive if indeed he/she is HIV-positive, and with probability 
0.96 that this person is not HIV-positve if he/she is not HIV-positive. 


1) What is the probability that a test person is HIV-positive if the test indicates that? 
To solve the problem, random events A and B are introduced: 

A ='The test indicates that a person is HIV-positive.' 

B ='A test person is HIV-positive. 

Then, from the figures given, 
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P(B) = 0.01, P(B) = 0.99 
P(A|B) = 0.98, P(A|B) = 0.02, P(AIB) = 0.96, P(A|B) = 0.04. 
Since {B,B} is an exhaustive and disjoint set of events, the total probability rule 
(1.23) is applicable to determining P(A): 
P(A) = P(A|B) P(B) + P(A|B) P(B) = 0.98 - 0.01 +0.04 - 0.99 = 0.0494. 
Bayes' theorem (1.24) yields the desired probability P(B|A): 


P(A|B) P(B) _ 0.98 - 0.01 
P(B\A) = PUA) = “0 0494 = 0.1984. 
Although the initial parameters of the test look acceptable, this result is quite unsatis- 
factory: In view of P(B|A) = 0.8016, about 80% HIV-negative test persons will be 
shocked to learn that the test procedure indicates they are HIV-positive. In such a sit- 
uation the test has to be repeated several times. The reason for this unsatisfactory 
numerical result is that only a small percentage of the population is HIV-positive. 


2) The probability that a person is HIV-negative if the test procedure indicates this is 


PCAIB) P(B) _ 0.96 - 0.99 _ 
PG) > 10.0494 = 0.99979. 


This result is, of course, an excellent feature of the test. O 


P(BIA) = 


1.4.3 Independent Random Events 


If a die is thrown twice, then the result of the first throw does not influence the result 
of the second throw and vice versa. If you have not won in the weekly lottery during 
the past 20 years, then this bad luck will not increase or decrease your chance to win 
in the lottery the following week. An aircraft crash over the Pacific for technical 
reasons has no connection to the crash of an aircraft over the Atlantic for technical 
reasons the same day. Thus, there are random events which do not at all influence 
each other. Events like that are called independent (of each other). Of course, for a 
quantitative probabilistic analysis a more accurate definition is required. 

If the occurrence of a random event B has no influence on the occurrence of a ran- 
dom event A, then the probability of the occurrence of A will not be changed by the 
additional information that B has occurred, 1.e. 

P(A B) 

P(B) 


This motivates the definition of independent random events: 


P(A) = P(A|B) = (1.26) 


Definition 1.5: Two random events A and B are called independent if 
P(AQB)=P(A)P(B). (1.27) 
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This is the product formula for independent events A and B. Obviously, (1.27) is also 
valid for P(B) = 0 and/or P(A) = 0. Hence, defining independence of two random 
events by (1.27) is preferred to defining independence by formula (1.26). 


If A and B are independent random events, then the pairs A and B, A and B, as well 
as A and B are independent, too. That means relation (1.27) implies, e.g., 


P(A AB) = P(A) P(B). 
This can be proved as follows: 
P(ANB)=P(AN Q\8)) = P(A NQ)\(ANB)) = PIAA TB)) 
= P(A) — P(A OB) = P(A) - P(A)P(B) 
= P(A)[1 — P(B)] = P(A) PB). 


The generalization of the independence property to more than two random events is 
not obvious. The pairwise independence between 1 = 2 events is defined as follows: 


The events A}, Ap,..., An are called pairwise independent if for each pair (A;, A;) 
P(A; 04;)=P(A) P(A), 14), 67 =1,2,..50. 
A more general definition of the independence of 7 events is the following one: 
Definition 1.6 The random events A, A2,..., An are called completely independent 
or simply independent if for all k = 2, 3,...,n, 
PAL Ag Cr Ay) HPA; PA) PA, (1.28) 


for any subset {A;,, 4j,,.-.. 4i,} Of {41,42,...4n} with 1 <i] <ig<-+-<ipsn. @ 


Thus, to verify the complete independence of n random events, one has to check 


y (") =2"-n-1 


k=2 
conditions. Luckily, in most applications it is sufficient to verify the case k=n: 
P(A, NA20°:: CAn) = P(A) P(A2)*** P(An). (1.29) 


The complete independence is a stronger property than the pairwise independence. 
For this reason it is interesting to consider an example, in which the 4), Ao,...,An are 


pairwise independent, but not complete independent. 


Example 1.20 The dice D; and D> are thrown. The corresponding sample space 
consists of 36 elementary events: Q = {(i,/); i,j = 1,2,...,6}. Let 


A, ='D, shows a I'= {(1, 1), (1,2), (1,3), (1,4), (1,5), (1, 6)}, 
Ay = 'Dp shows a I" = {(1,1), (2,1), 3,1), 4,1, (5,1), (6 D}, 
A3 = ‘both D; and D2 show the same number' = '{(i,/), 7 = 1, 2,...,6)}.' 
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Since the A; each contain 6 elementary events, 
P(A1) =P(A2) = P(A3) = V6. 
The A; have only one elementary event in common, namely (1, 1). Hence, 


P(A, 0.42) = P(A] 0.43) = P(42043) = 2-2 = =. 


Therefore, the A; are pairwise independent. However, there is 
A, OA20NA3= {, 1)}. 
Hence, 


1 
P(A] A243) = 36 # PUA 1)P(A2)PCA3) = 5g = THe oO 


Example 1.21 (Chevalier de Méré) What is more likely: 1) to get at least one 6, 
when throwing four dice simultaneously (event A), or 2) to get the outcome (6,6) at 
least once, when throwing two dice 24 times simultaneously (event B)? 


The complementary events to A and B are: 
A = 'none of the dice shows a 6, when four dice are thrown simultaneously, ' 
B = 'the outcome (6,6) does not occur, when two dice are thrown 24 times.’ 


1) Both the four results obtained by throwing four or two dice and the results by 
repeatedly throwing two dice are independent of each other. Hence, since the proba- 
bility to get no 6, when throwing one die, is 5/6, formula (1.29) with n = 4 yields 


P(A) = (5/6)4. 
The probability, not to get the result (6,6) when throwing two dice, is 35/36. Hence, 
formula (1.29) yields with n = 24 the probability 
P(B) = (35/36)24. 
Thus, the desired probabilities are 
P(A) = 1-(5/6)* ~ 0.518, P(B) = 1 -(35/36)*4 ~ 0.491. oO 
Example 1.22 Ina set of traffic lights, the color 'red' (as well as green and yellow) is 


indicated by two bulbs which operate independently of each other. Color 'red' is 
clearly visible if at least one bulb is operating. 


What is the probability that in the time interval [0, 200 hours] color 'red' is visible if 
it is known that a bulb survives this interval with probability 0.95 ? 


To answer this question, let 
A= ‘bulb | does not fail in [0, 200],' B= 'bulb 2 does not fail in [0, 200].' 
The event of interest is 
C=AUB = 'red light is clearly visible in [0, 200].' 
By formula (1.16), 
P(C) = P(AUB) = P(A) + P(B)- P(A B). 
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Since 4 and B are independent, 
P(C) = P(A) + P(B) — P(A) P(B)= 0.95 + 0.95 — (0.95)?. 
Thus, the desired probability is 
P(C) = 0.9975. 


Another possibility of solving this problem is to apply the Rules of de Morgan (1.1): 
P(C) = P(AUB) = P(A 7B) = P(A) P(B) = (1 —0.95)(1 — 0.95) 


= 0.0025 
so that P(C) = 1 — P(C) = 0.9975. Oo 
e] e2 
en @ el 3 @ ex 
eo e3 


Figure 1.6 Diagram of a'2 out of 3-system' 


Example 1.23 ('2 out of 3 system’) A system S consists of 3 independently operat- 
ing subsystems S,, Sy, and S3. The system operates if and only if at least 2 of its 
subsystems operate. Figure 1.6 illustrates the situation: S operates if there is at least 
one path with two operating subsystems (symbolized by rectangles) from the entrance 
node en to the exit node ex. As an application may serve the following one: The pres- 
sure in a high-pressure tank is indicated by 3 gauges. If at least 2 gauges show the 
same pressure, then this value can be accepted as the true one. (But for safety reasons 
the failed gauge has to be replaced immediately.) 


At a given time point fo, subsystem S$; is operating with probability p;, i=1,2,3. 
What is the probability ps that the system S'is operating at time point tg? 


Let Ag be the event that S' is working at time point fg, and A; be the event that S; is 
operating at time point fg. Then, 


Ags = (A, NA2)U(A1 1 A3)U(A2 1.43). 
With A=A,; N42, B=A, OA3, and C=A2OA3, formula (1.17) can be directly 
applied and yields the following representation of 43: 
P(As) = P(A, NA2)+ P(A] NA3) + (42 NA3)—2P(A] NA2 043). 
In view of the independence of the 4;, 42, and 43, this probability can be written as 


P(As) = P(A )P(A2) + P(A 1 )P(A3) + P(A2)P(A3) — 2P(A 1 P(A 2) P(A). 
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or 
P(As) = P1P2+P1P3 + P2P3 — 2P1P2P3- 
In particular, if p = p;, i= 1, 2,3, then 
P(As) = (3 -2p)p?. Oo 


Disjoint and independent random events are causally not connected. Nevertheless, 
sometimes there is confusion about their meaning and use. This may be due to the 
formal analogy between their properties: 


If the random events 41,A9,...,4n are disjoint, then, by formula (1.10), 
P(A{UA2U>::UAn) = P(A1) + P(A2) 4+°+++ P(An). 
If the random events A1,A49,...,4n are independent, then, by formula (1.29), 
P(A, NA2.0°+:OAn) = P(A): P(A2) +++ P(An). 


1.5 EXERCISES 


Sections 1.1—1.3 
1.1) A random experiment consists of simultaneously flipping three coins. 
(1) What is the corresponding sample space? 


(2) Give the following events in terms of elementary events: 
A =‘head appears at least two times,' B = 'head appears not more than once,' and 
C='no head appears.' 


(3) Characterize verbally the complementary events of A, B, and C. 


1.2) A random experiment consists of flipping a die to the first appearance of a '6'. 
What is the corresponding sample space? 


1.3) Castings are produced weighing either 1, 5, 10, or 20 Ag. Let A, B, and C be the 
events that a casting weighs | or 5kg, exactly 10kg, and at least 10g, respectively. 


Characterize verbally the events AB, AUB, ANC, and (AUB)AC. 


1.4) Three randomly chosen persons are to be tested for the presence of gene g. 
Three random events are introduced: 

A ='none of them has gene g,' 

B = 'at least one of them has gene g,' 

C ='not more than one of them has gene g". 

Determine the corresponding sample space Q and characterize the events 
AB, BUC, and BAC by elementary events. 
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1.5) Under which conditions are the following relations between events A and B true: 
(Q)ANB=Q, Q)AUB=Q, B)AUB=ANB? 


1.6) Visualize by a Venn diagram whether the following relations between random 
events A, B, and C are true: 


(VNMANBYUQO=40NBUANO, 

(2) (ANB) U(ANB)=A, 

(3) AUB=BU(ANB). 

1.7) (1) Verify by a Venn diagram that for three random events A, B, and C the 
following relation is true: (A\B) 7WC=(ANC)\(BN C). 

(2) Is the relation (A 7 B)\C = (A\C) A (B\C) true as well? 


1.8) The random events A and B belong to a o—algebra E. 
What other events, generated by A and B, must belong to E (see definition 1.2)? 


1.9) Two dice D, and D> are simultaneously thrown. The respective outcomes of D; 
and D> are @; and @>. Thus, the sample space is Q = {(@1,@2); ©), @2 = 1,2,...,6}. 
Let the events A, B, and C be defined as follows: 

A = 'The outcome of D, is even and the outcome of D2 is odd’, 

B="The outcomes of D; and D> are both even". 


What is the smallest o—algebra E generated by A and B (‘smallest' with regard to the 
number of elements in £)? 


1.10) Let A and B be two disjoint random events, 47] Q,BcQ. 


Check whether the set of events {A, B, AB, and AB} is (1) an exhaustive and 
(2) a disjoint set of events (Venn diagram). 


1.11) A coin is flipped 5 times in a row. What is the probability of the event A that 
‘head' appears at least 3 times one after the other? 


1.12) A die is thrown. Let A = {1,2,3} and B = {3,4,6} be two random events. 
Determine the probabilities P(A U B), P(A 7 B), and P(B\A). 


1.13) A die is thrown 3 times. Determine the probability of the event A that the 
resulting sequence of three integers is strictly increasing. 


1.14) Two dice are thrown simultaneously. Let (@;,@2) be an outcome of this ran- 
dom experiment, A ='®; + @ < 10' and B='@, -@2 219.' 
Determine the probability P(A 7 B). 
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1.15) What is the probability p3 to get 3 numbers right with 1 ticket in the '6 out of 
49' number lottery? 


1.16) A sample of 300 students showed the following results with regard to physical 
fitness and body weight: 


weight [kg] 


60 < [60-80] 80 > 

good 48 64 11 

fitness |satisfactory| 22 42 29 
bad 19 17 48 


One student is randomly chosen. It happens to be Paul. 
(1) What is the probability that the fitness of Paul is satisfactory? 
(2) What is the probability that the weight of Paul is greater than 80 kg? 


(3) What is the probability that the fitness of Paul is bad and that his weight is less 
than 60 kg? 


1.17) Paul writes four letters and addresses the four accompanying envelopes. After 
having had a bottle of whisky, he puts the letters randomly into the envelopes. Deter- 
mine the probabilities p; that & letters are in the 'correct' envelopes, k = 0, 1,2, 3. 


1.18) A straight stick is broken at two randomly chosen positions. What is the pro- 
bability that the resulting three parts of the stick allow the construction of a triangle? 


1.19) Two hikers climb to the top of a mountain from different directions. Their arriv- 
al time points are between 9:00 and 10:00 a.m., and they stay on the top for 10 and 
20 minutes, respectively. For each hiker, every time point between 9 and 10:00 has 
the same chance to be the arrival time. What is the probability that the hikers meet on 
the top? 


1.20) A fence consists of horizontal and vertical wooden rods with a distance of 10cm 
between them (measured from the center of the rods). The rods have a circular sec- 
tional view with a diameter of 2cm. Thus, the arising squares have an edge length of 
8cm. Children throw balls with a diameter of 5cm horizontally at the fence. What is 
the probability that a ball passes the fence without touching the rods? 


1.21) Determine the probability that the quadratic equation 

x274+2 fax=b-1 
does not have a real solution if the pair (a,b) is randomly chosen from the quarter 
circle {(a,b); a,b > 0, a2 +b? <1}. 
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1.22) Let A and B be disjoint events with P(A) = 0.3 and P(B) = 0.45. Determine the 
probabilities P(A UB), P(AU B), P(AUB), and P(A B). 


1.23) Let P(A 4 B) = 0.3 and P(B) = 0.6. Determine P(A U B). 


1.24) Is it possible that for two events A and B with P(A) =0.4 and P(B) = 0.2 the 
relation P(A 4 B) = 0.3 is true? 


1.25) Check whether for 3 arbitrary random events A, B, and C the following con- 
stellations of probabilities can be true: 


(1) P(A) = 0.6, P(A B) = 0.2, and P(A OB) = 0.5, 
(2) P(A) = 0.6, P(B) = 0.4, P(A AB) =0, and PPANBOC)=0.1, 
(3) P(AUBUC) = 0.68 and P(ANB)=P(ANC)=1. 


1.26) Show that for two arbitrary random events A and B the following inequalities 
are true: P(A MB) < P(A) < P(A UB) ¥ P(A) + P(B). 


1.27) Let A, B, and C be 3 arbitrary random events. 


(1) Express the event 'A occurs, but B and C do not occur’ in terms of suitable rela- 
tions between these events and their complements. 


(2) Prove: the probability of the event 'exactly one of the events A, B, or C occurs' is 
P(A) + P(B) + P(C) — 2P(A 0 B)-2P(A AN C)-2P(BAC)+3P(AN BOC). 


Section 1.4 


1.28) Two dice are simultaneously thrown. The result is (©;, 2). What is the proba- 
bility p of the event 'w2 = 6' on condition that 'o; +2 = 8?" 


1.29) Two dice are simultaneously thrown. By means of formula (1.24) determine 
the probability p that the dice show the same number. 


1.30) A publishing house offers a new book as standard or luxury edition and with or 
without a CD. The publisher analyzes the first 1000 orders: 


luxury edition 


yes no 
with CD | yes | 324 82 
no 48 546 


Let A (B) the random event that a book, randomly choosen from these 1000, is a 
luxury one (comes with a CD). (1) Determine the probabilities 


P(A), P(B), P(AUB), P(AMB), P(A|B), P(B|A), P(A U BIB), and P(A |B). 
(2) Are the events A and B independent? 
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1.31) A manufacturer equips its newly developed car of type Treekill optionally with 
or without a tracking device and with or without speed limitation technology and 
analyzes the first 1200 orders: 


speed limitation 


yes no 


tracking device | yes 74 642 
no 48 436 


Let A (B) the random event that a car, randomly chosen from these 1200, has speed 
limitation (comes with a tracking device). 


(1) Calculate the probabilities P(A), P(B), and P(A 7 B) from the figures in the table. 


(2) Based on the probabilities determined under a), only by using the rules developed 
in section 1.3.3, determine the probabilities 


P(AUB), P(A|B), P(B|A), P(AU BIB), and P(4|B). 


1.32) A bowl contains m white marbles and n red marbles. A marble is taken ran- 
domly from the bowl and returned to the bowl together with r marbles of the same 
color. This procedure continues to infinity. 


(1) What is the probability that the second marble taken is red? 


(2) What is the probability that the first marble taken is red on condition that the 
second marble taken is red? (This is a variant of Pélya's urn problem.) 


1.33) A test procedure for diagnosing faults in circuits indicates no fault with probab- 
ility 0.99 if the circuit is faultless. It indicates a fault with probability 0.90 if the cir- 
cuit is faulty. Let the probability of a circuit to be faulty be 0.02. 


(1) What is the probability that a circuit is faulty if the test procedure indicates a fault? 


(2) What is the probability that a circuit is faultless if the test procedure indicates that 
it is faultless? 


1.34) Suppose 2% of cotton fabric rolls and 3% of nylon fabric rolls contain flaws. 
Of the rolls used by a manufacturer, 70% are cotton and 30% are nylon. 


a) What is the probability that a randomly selected roll used by the manufacturer 
contains flaws? 

b) Given that a randomly selected roll used by the manufacturer does not contain 
flaws, what is the probability that it is a nylon fabric roll? 


1.35) A group of 8 students arrives at an examination. Of these students 1 is very 
well prepared, 2 are well prepared, 3 are satisfactorily prepared, and 2 are insuffi- 
ciently prepared. There is a total of 16 questions. A very well prepared student can 
answer all of them, a well prepared 12, a satisfactorily prepared 8, and an insuffi- 
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transmitter receiver 
0.9 
0 >0 
0.05 
0.1 
1 >| 
0.95 


ciently prepared 4. Each student has to draw randomly 4 questions. Student Frank 
could answer all the 4 questions. What is the probability that Frank 


(1) was very well prepared, 
(2) was satisfactorily prepared, 
(3) was insufficiently prepared? 


1.36) Symbols 0 and 1 are transmitted independently from each other in proportion 
1:4. Random noise may cause transmission failures: If a 0 was sent, then a | will 
arrive at the sink with probability 0.1. If a 1 was sent, then a 0 will arrive at the sink 
with probability 0.05 (figure). 

(1) What is the probability that a received symbol is '1'? 

(2)'I' has been received. What is the probability that '1' had been sent? 

(3) '0' has been received. What is the probability that '1' had been sent? 


1.37) The companies 1, 2, and 3 have 60, 80, and 100 employees with 45, 40, and 25 
women, respectively. In every company, employees have the same chance to be 
retrenched. It is known that a woman had been retrenched (event B). 


What is the probability that she had worked in company 1, 2, and 3, respectively? 


1.38) John needs to take an examination, which is organized as follows: To each 
question 5 answers are given. But John knows the correct answer only with probabil- 
ity 0.6. Thus, with probability 0.4 he has to guess the right answer. In this case, John 
guesses the correct answer with probability 1/5 (that means, he chooses an answer by 
chance). What is the probability that John knew the answer to a question given that 
he did answer the question correctly? 


1.39) A delivery of 25 parts is subject to a quality control according to the following 
scheme: A sample of size 5 is drawn (without replacement of drawn parts). If at least 
one part is faulty, then the delivery is rejected. If all 5 parts are o.k., then they are 
returned to the lot, and a sample of size 10 is randomly taken from the original 25 
parts. The delivery is rejected if at least 1 part out of the 10 is faulty. 

Determine the probabilities that a delivery is accepted on condition that 

(1) the delivery contains 2 defective parts, 


(2) the delivery contains 4 defective parts. 
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1.40) The random events 41, A2,...,4 are assumed to be independent. Show that 
P(A, Ag+ An) = 1-1 = P(A) — P42) +1 — Pn). 


1.41) n hunters shoot at a target independently of each other, and each of them hits it 
with probability 0.8. Determine the smallest n with property that the target is hit with 
probability 0.99 by at least one hunter. 


1.42) Starting a car of type Treekill is successful with probability 0.6. What is the 
probability that the driver needs no more than 4 start trials to be able to leave? 


1.43) Let A and B be two subintervals of [0, 1]. A point x is randomly chosen from 
[0,1]. Now A and B can be interpreted as random events, which occur if x € A or 
x € B, respectively. Under which condition are A and B independent? 


1.44) A tank is shot at by 3 independently acting anti-tank helicopters with one anti- 
tank missile each. Each missile hits the tank with probability 0.6. If the tank is hit by 
1 missile, it is put out of action with probability 0.8. If the tank is hit by at least 2 mis- 
siles, it is put out of action with probability 1. 


What is the probability that the tank is put out of action by this attack? 


1.45) An aircraft is targeted by two independently acting ground-to-air missiles. Each 
missile hits the aircraft with probability 0.6 if these missiles are not being destroyed 
before. The aircraft will crash with probability 1 if being hit by at least one missile. 
On the other hand, the aircraft defends itself by firing one air-to-air missile each at 
the approaching ground-to-air missiles. The air-to-air missiles destroy their respec- 
tive targets with probablity 0.5. 


(1) What is the probability that p the aircraft will crash as a result of this attack? 


(2) What is the probability that the aircraft will crash if two independent air-to-air 
missiles are fired at each of the approaching ground-to-air-missiles? 


1.46) The liquid flow in a pipe can be interrupted by two independent valves V; and 
V>, which are connected in series (figure). For interrupting the liquid flow it is suf- 
ficient if one valve closes properly. The probability that an interruption is achieved 
when necessary is 0.98 for both valves. On the other hand, liquid flow is only possi- 
ble if both valves are open. Switching from 'closed' to 'open' is successful with 
probability 0.99 for each of the valves. 


(1) Determine the probability to be able to interrupt the liquid flow if necessary. 
(2) What is the probability to be able to resume liquid flow if both valves are closed? 


CHAPTER 2 


One-Dimensional Random Variables 


2.1 MOTIVATION AND TERMINOLOGY 


Starting point of chapter | is a random experiment with sample space Q, which is the 
set of all possible outcomes of the random experiment under consideration, and the 
set (o—algebra) E of all random events, where a random event A ¢€ E is a subset of 
the sample space: A <Q. In this way, together with a probability function P defined 
on E, the probability space [Q, E, P] is given. In many cases, the outcomes (element- 
ary events) of random experiments are real numbers (throwing a die, counting the 
number of customers arriving per unit time at a service station, counting of wildlife 
in a specific area, total number of goals in a soccer match, or measurement of life- 
times of organisms and technical products). In these cases, the outcomes of a series 
of identical random experiments allow an immediate quantitative analysis. However, 
when the outcomes are not real numbers, i.e. Q is not a subset of the real axis or the 
whole real axis, then such an immediate numerical analysis is not possible. To over- 
come this problem, a real number z is assigned to the outcome @ by a given real-val- 
ued function g defined on Q: z= g(@), @ € Q. 


Examples for situations like that are: 


1) When flipping a coin, the two possible outcomes are @, = 'head' and  ='tail'. A 
'l' is assigned to head and a '0' to tail (for instance). 


2) An examination has the outcomes @, = ‘with distinction’, @2 ='very good, 

3 ='good', 4 = ‘satisfactory’, and w5 = 'not passed’. The figures 'S', '4', ---, '1' (for 
instance) are assigned to these verbal evaluations. 

3) Even if the outcomes are real numbers, you may be more interested in figures de- 
rived from these numbers. For instance, the outcome is the number n of items you 
have produced during a workday. For first item you get a financial reward of $10, 
for the second of $11, for the third $12, and so on. Then you are first of all interested 
in your total income per working day. 


4) If the outcomes of random experiments are vectors of real numbers, it may be 
opportune to assign a real number to these vectors. For instance, if you throw four 
dice simultaneously, you get a vector with four components. If you win, when the 
total sum exceeds a certain amount, then you are not in the first place interested in 
the four individual results, but in their sum. In this way, you reduce the complexity of 
the ran- dom experiment. 


5) The random experiment consists in testing the quality of 100 spare parts taken ran- 
domly from a delivery. A 'l' is assigned to a spare part which meets the requirements, 
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and a'0' otherwise. The outcome of this experiment is a vector © = (@1,2,-**, 100); 
the components ; of which are 0 or 1. Such a vector is not tractable, so you want to 


assign a summarizing quality parameter to it to get a random experiment, which has a 
one-dimensional result. This can be, e.g., the relative frequency of those items in the 
sample, which meet the requirements: 


2=8(0)= Top 2 Op. (2.1) 


Basically, application of a real function to the outcomes of a random experiment does 
not change the 'nature' of the random experiment, but simply replaces the 'old' sample 
space with a 'new' one, which is more suitable for the solution of directly interesting 
numerical problems. In the cases 1 and 3 — 5 listed above: 

1) The sample space {tail, head} is replaced with {0, 1}. 

3) The sample space {0, 1, 2, 3, 4, ...} is replaced with {0, 10, 21, 33, 46,...}. 

4) The sample space {(@1,@7, 3, @4); ©; = 1,2,...,6}, which consists of 64 = 1296 
elementary events of the structure © =(@1,@2,@3,@4), is replaced with the sample 
space {6, 7,...,24}. 

5) The sample space consisting of the elementary events @ = (@1,2,...,@100) 
with , is 0 or | is reduced by the relative frequency function g given by (2.1) toa 


2100 


sample space with 101 elementary events: 


1 2 99 
{0, 100° 100°. ner 100° 1}. 

Since the outcome w of a random experiment is not predictable, it is also random 
which value the function g() will assume after the random experiment. Hence, 
functions on the sample space of a random experiment are called random variables. 
In the end, the concept of a random variable is only a somewhat more abstract formu- 
lation of the concept of a random experiment. But the terminology has changed: One 
says on the one hand that as a result of a random experiment an elementary event has 
occurred, and on the other hand, a random variable has assumed a value. In this 
book (apart from Chapter 12) only real-valued random variables are considered. As it 
is common in literature, random variables will be denoted by capital Latin letters, 
e.g. X, Y, Z or by Greek letters as ¢, € n. 


Let X be a random variable: X= X(@),  € Q. The range Ry of X is the set of all 
possible values X can assume. Symbolically: Ry = X(Q). The elements of Ry are 
called the realizations of X or their values. If there is no doubt about the underlying 
random variable, the range is simply denoted as R. 


A random variable X is a real function on the sample space Q of a random exper- 
iment. This function generates a new random experiment, whose sample space is 
given by the range Ry of X. The probabilistic structure of the new random experi- 
ment is determined by the probabilistic structure of the original one. 
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When discussing random variables, the original, application-oriented random exper- 
iment will play no explicit role anymore. Thus, a random variable can be considered 
to be an abstract formulation of a random experiment. With this in mind, the proba- 
bility that XY assumes a value out of a set A, A C R, is an equivalent formulation for 


the probability that the random event A occurs, i.e. 
P(A) = P(X € A) = P(o, X(@) € A). 


For one-dimensional random variables _X, it is sufficient to know the interval probab- 
ilities PL) = P(X € J) for all intervals: J = [a, b), a <b, ie. 


P(X € 1 =P(a<X<b)=P(o, a<X(@) <b). (2.2) 


If R is a finite or countably infinite set, then /=[a, b) is simply the set of all those 
realizations of X, which belong to J. 


Definition 2.1 The probability distribution or simply distribution of a one-dimen- 
sional random variable X is given by a rule P, which assigns to every interval of the 
real axis /= [a,b], a< X<)b, the probabilities (2.2). e 


Remark \n view of definition 1.2, the probability distribution of any random variable X should 
provide probabilities P(X € A) for any random event 4 from the sigma algebra E of the under- 
lying measurable space [Q, E], i.e. not only for intervals. This is indeed the case, since from 
measure theory it is known that a probability function, defined on all intervals, also provides 
probabilities for all those events, which can be generated by finite or countably infinite unions 
and conjunctions of intervals. For this reason, a random variable is called a measurable function 
with regard to [Q, E]. This application-oriented text does not explicitely refer to this measure- 
theoretic background and is presented without measure-theoretic terminology. 


A random variable X is fully characterized by its range Ry and by its probability 
distribution. If a random variable is multidimensional, i.e. its values are n-dimen- 
sional vectors, then the definition of its probability distribution is done by assign- 
ing probabilities to rectangles for n =2 and to rectangular parallelepipeds for 
n=3 and so on. 


In chapter 2, only one-dimensional random variables will be considered, i.e., their 
values are scalars. 


The set of all possible values Ry, which a random variable X can assume, only plays 
a minor role compared to its probability distribution. In most cases, this set is deter- 
mined by the respective applications; in other cases there prevails a certain arbitrar- 
iness. For instance, the faces of a die can be numbered from 7 to 12; a 3 (2) can be 
assigned to an operating (nonoperating) system instead of | or 0. Thus, the most 
important thing is to find the probability distribution of a random variable. 
Fortunately, the probability distribution of a random variable X is fully characterized 
by one function, called its (cumulative) distribution function or its probability distri- 
bution function: 
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Definition 2.2 The probability distribution function (cumulative distribution function 
or simply distribution function) F(x) of a random variable X is defined as 


F(x) =P(X¥<x), -0o <x < +00, e 


Any distribution function F(x) has the following obvious properties: 
1) F(-00) = 0, F(+00) = 1, (2.3) 
2) F(x1) S$ F(x2) if xy Sx. (2.4) 


On the other hand, every real-valued function F(x) satisfying the conditions (2.3) and 
(2.4) can be considered the distribution function of a random variable. 


Given the distribution function of X, it must be possible to determine the interval pro- 
babilities (2.2). This can be done as follows: 


For a < b, the event "XY < b" is given by the union of two disjoint events: 
"Xs b"="Xsa"U"a<Xsb", 
Hence, by formula (1.11), P(X < 6) = P(X <a)+P(a<X<b), or, equivalently, 
P(a<X<b)=F(b)- F(a). (2.5) 


Thus, the cumulative distribution function contains all the information, specified in 
definition 2.1, about the probability distribution of a random variable. Note that defi- 
nition 2.2 refers both to discrete and continuous random variables: 


A random variable X is called discrete if it can assume only finite or countably 
infinite many values, i.e., its range R is a finite or a countably infinite set. A ran- 
dom variable X is called continuous if it can assume all values from the whole real 
axis, a real half-axis, or at least from a finite interval of the real axis or unions of 
finite intervals. 


Examples for discrete random variables are: 


Number of flipping a coin to the first appearance of 'head', number of customers arriv- 
ing at a service station per hour, number of served customers at service station per 
hour, number of traffic accidents in a specified area per day, number of staff being 
on sick leave a day, number of rhinos poached in the Kriiger National park a year, 
number of exam questions correctly answered by a student, number of sperling errors 
in this chapter. 


Examples for continuous random variables are: 


Length of a chess match, service time of a customer at a service station, lifetimes of 
biological and technical systems, repair time of a failed machine, amount of rainfall 
per day at a measurement point, measurement errors, sulfur dioxide content of the air 
(with regard to time and location), daily stock market fluctuations. 
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2.2 DISCRETE RANDOM VARIABLES 


2.2.1 Probability Distribution and Distribution Parameters 


Let X be a discrete random variable with range R = {xo,x1,°-:}. The probability dis- 
tribution of X is given by a probability mass function f(x). This function assigns to 
each realization of X its probability p;=f(x;); i=0,1,.... Without loss of genera- 
lity it can be assumed that each p; is positive. Otherwise, an x; with f(x;)=0 could 
be deleted from R. Let A; = "X =x;" be the random event that XY assumes value x;. 
The A; are mutually disjoint events, since XY cannot assume two different realizations 
at the same time. The union of all 4;, 


Uizo A i> 
is the certain event Q, since X must take on any of its realizations. (A random experi- 


ment must have an outcome.) Taking into account (1.9), a probability mass function 
J (x) has two characteristic properties: 


1) f@)>0, 2) Lieof@)=1. (2.6) 
Every function f(x) having these two properties can be considered to be the probabi- 


lity mass function of a discrete random variable. By means of f(x), the probability 
distribution function of X, defined by (2.3), can be written as follows: 


0 if x<xQ, 
F(x) = x flx;) if xo <x. 
{Xj,X7<x} 


With p;=/f(x;), an equivalent representation of F(x) is 


for x<xo, 


F(x) = P(X <x) -| 


Sep for xp <x<XxXp41, k=0,1,2,--:. 


Figure 2.1 shows the typical graph of the distribution function of a discrete random 
variable X in terms of the cumulative probabilities s ;: 
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Figure 2.1 Graph of the distribution function of an arbitrary discrete random variable 
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Se=PotPpit-'+Prs k=0,1,..., 
or k= F(xK) = feo) + £1) + +f K)- 


Thus, the distribution function of a discrete random variable is a piecewise constant 
function (step function) with jumps of sizes 


pi = P(X=x;) = F@;)-F(x;-9), i=0,1,.... 


The probability mass function of a random variable X as well as its distribution func- 
tion can be identified with the probability distribution Py of X. 


XO xX] x2 Xx3 X4 X5 X6 


Figure 2.2 Probability histogram of a symmetric discrete distribution 


Figure 2.2 shows the probability histogram of a discrete random variable. It graphi- 
cally illustrates the frequency distribution of the occurrence of the values x; of X. In 
this special case, the distribution is symmetric around x3, i.e. 


P0=P6 P1=Ps, and po =p4. 


Hint For technical reasons it is frequently practical to renumber the x; and p; and start with 
x1 (p1) instead of xg (pg). In what follows, no further reference will be made regarding this. 
Moreover, the notation p; will be preferred to f(x;). 


Example 2.1 (uniform distribution) A random variable X is uniformly distributed 
over its range R= {1,2,...,m} if it has the probability distribution 


pi=P(X=xj)=4; i=1,2,....m; m<o. 


The conditions (2.6) are fulfilled. Thus, X is the outcome of a Laplace random exper- 
iment (section 1.3), since every value of X has the same chance to occur. The cumu- 
lative probabilities are s; = i/m, i<m. The corresponding distribution function is 


0 for x<1, 
F(x)=P(XSx)=4 ifm for i<x<it+l, i=1,2,---,m-1, oO 
1 for m<x. 
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Example 2.2 The leaves of Fraxinus excelsior (an ash tree) have an odd number of 
leaflets. This numbervaries from 3 to 11. A sample of n = 300 leaves had been taken 
from a tree. Let X be the number of leaflets of a randomly picked leaf from this sam- 
ple. Then_X is a random variable with range R = {3,5,7,9, 11}. 


Table 2.1 shows the probability distribution of X: The first column contains the pos- 
sible number of leaflets 7, the second column the number 7; of leaves with i leaflets, 
the third one the probability that a randomly choosen leaf from the sample has i leaf- 
lets: p; =n;/n. (In terms of mathematical statistics, p; is the relative frequency of the 
occurrence of leaves with i leaflets in the sample.) The fourth column contains the 
cumulative probabilities s; (cumulative frequencies). 


i n; Pi Si 

3 8 | 0.0267 | 0.0267 

5 36 | 0.1200 | 0.1467 

7 | 108 | 0.3600 | 0.5067 

9 | 118 | 0.3933 | 0.9000 
11 30 | 0.1000 | 1 


Table 2.1 Distribution of leaflets at leaves of Fraxinus excelsior 


Figure 2.3 shows the distribution function and the probability histogram of X. For 
instance, s7 = 0.5607 is the probability that a randomly selected leaf has at most 7 


leaflets, and a randomly drawn leaf from the sample has most likely 9 leaflets. oO 
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Figure 2.3 Distribution function a) and histogram b) for example 2.2 


As pointed out before, the probability distribution and the range R contain all the in- 
formation on X. However, to get quick information on essential features of a random 
variable, it is desirable to condense as much as possible of this information to some 
numerical parameters. In what follows, let the range of X be R = {xo,x1,--:}. If the 
range is finite, i.e., R = {x9,X1,°°-;Xm; m< oo}, the formulas to be given stay valid if 
letting X41 =Xmy2 =-:: = 0. 
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Mean Value Ifa random variable X has the finite range R = {xo,x1,...,Xm}, then at 
first glance the average result of a random experiment with outcome X is 


2 if m 
an m+1 i= Xi» 


the arithmetic mean of all possible values of X. But this is only true if every value of 
X has the same chance to occur as this is the case with a uniformlydistributed random 
variable. Otherwise, those realizations of X contribute most to the average result (rela- 
tively to their absolute value), which occur more frequently than other realizations. 
To illustrate this, let us assume that in a series of n random experiments no times xq, 
n, times x1, °-:, and mm times xm occurred. Then there is nm =ng +n, +-::+Nm, and 
the arithmetic mean of all observations is 


= 1 no n\ Nm 
X= 4 (NoXQ +X] +++ +NmXm) = |XOt+ GX t+ aX. 
The ratio ;/n is the relative frequency for the occurrence of x; out of the total of n 


observations, which, as will be shown in section 5.2.2, tends for alli = 0,1,...,m to the 
probability p; = P(X =.x;) as n > o. Thus, the following definition is well motivated: 


The mean value, or expected value, or simply the mean of a random variable X is 
E(X)=Lieoxip; giventhat Dio |x;|p;<~. (2.7) 


Thus, the mean value of a discrete random variable X is the weighted sum of all its 
possible values, where the weights of the x; are their respective probabilities. The 
convergence condition in (2.7) makes sure that E(X) exists (i.e., is finite). Note that 


ioe) 
E(\X1) = Xi-0 [eal pi- (2.8) 
A statistical motivation to the mean value of a random variable is the following one: 
If one and the same random experiment with outcome X is repeated n times and the 
results are x;,,;,...,%;,, the arithmetic mean i ye x;, tends to E(X) as n> . 


If X is nonnegative with range R = {0, 1,2,...}, then its mean value can be written in 
the following way: 


E(X) = ico ipi = Lier POX) = Lies Lees Pr (2.9) 
If A(x) is a real function, then the mean value of the random variable Y = h(X) is 
E(Y) = Lio h(x) pi- (2.10) 


In this formula, y; = A(x;), i=0,1,... are the possible values which the random var- 
iable Y can take on. Since the y; occur with the same probabilities as the x;, namely 
Pj, (2.10) gives indeed the mean value of Y. As a special case, let Y=”. Then the 


mean value of X” is given by (2.10) with A(x;) = x7: 
EX =D %, Px HHO ices 


E(X") is called the nth (ordinary) moment of X. Therefore, the mean value E(X) is 
the first (ordiary) moment of X. 


2 ONE-DIMENSIONAL RANDOM VARIABLES 47 


Variability In addition to its mean value E(X), one is interested in the variability 
(scatter, fluctuations) of the outomes of a random experiment (given by X) in series 
of repetitions of this experiment. These fluctuations are measured by the absolute 
distances of the values x; from E(X): |x;—-E(X)|. This leads to the mean absolute 
linear deviation of a random variable X from its mean value: 

E(X— E(X)|) = Leo [x7 - BOD: (2.11) 


The mean absolute linear deviation of X is a special case of the n th absolute central 
(ordinary) moment of X: 


Mn = E(IX- EWO|") = Dizo |x; BOO "pi, 2 =0,1,.... 


For pactical and theoretical reasons, one usually prefers to work with the squared 
deviation of the x; from E(X): (x; — E(X))*. The mean value of the squared deviation 
of a random variable X from its mean value E(X) is called variance of X and denoted 


as Var(X): 

Var(X) = E(X— E(X))? =D i29(x; - EX)" pi. (2.12) 
The variance is obviously equal to the second absolute central moment of X. The 
square root of the variance [Var(X) is called the standard deviation of X. For any 
random variable X, the following notation is common: 


o* = Var(X), o= fVar(X). 


Note, in determining Var(X), formula (2.10) has been used with h(x;) = (x; — E(X))?. 
From (2.12), for any constant h, 


Var(hX) = h? Var(X). 
There is a useful relationship between the variance and the second moment of X: 
Var(X) = E(X- E(X))? = E(X?) —2 E[XE(X)] + E [(E(X)]” 


so that 

Var(X) = E(X*) - (E(X))”. (2.13) 
The coefficient of variation of X is 

V(X) = o/|E(X)|. 

Variance, standard deviation, and coefficient of variation are all measures for the var- 
iability of X. The coefficient of variation is most informative in this regard for taking 
into account not only the deviation of X from its mean value, but also relates this de- 
viation to the mean value of X. For instance, if the variabilities of two random variab- 
les X and Y with equal variances Var(X) = Var(Y) = 5, but with different mean values 


E(X) = 10 and E(Y) = 100, have to be compared, then it is already intuitively obvious 
that the scatter behavior of X is more distinct than that of Y: 


V(X) =0.5, VY) =0.05. 


48 APPLIED PROBABILITY AND STOCHASTIC PROCESSES 


Continuation of Example 2.2 The mean number of leaflets is 
E(X) = 3 - 0.0267 +5 - 0.1200 + 7 - 0.3600 + 9 - 0.3933 + 11 - 0.1000 = 7.8398. 
The variance of the number of leaflets is 
Var(X) = (3 — 7.8398) - 0.0267 + (5 — 7.8398) - 0.12 + (7 — 7.8398)? - 0.36 
+ (9— 7.8398)? - 0.3933 + (11 —7.8398)?-0.1 =3.3751. 
Altogether, 
E(X) = 7.8398, Var(X) = 3.3751, [Var(X) = 1.8371, V(X) = 0.2343. 


It is interesting to compare the standard deviation to the mean absolute linear devia- 
tion, since one expects that E(|X— E(X)|) is somewhere in the order of the standard 
deviation: From (2.14), 


E(|X- E(X)]) = [3 — 7.8398 | - 0.0267 + |5 — 7.8398] - 0.12 + |7 — 7.8398] - 0.36 
+ [9 — 7.8398] - 0.3933 + |11 — 7.8398] -0.1 = 1.5447. 
Thus, E(|X— E(X)|) = 1.5447 < [Var(X) = 1.8371. Oo 


2.2.2 Important Discrete Probability Distributions 


In this section, the following finite and infinite series are needed: 


ua ;_ n+) 


2.14 
2 5) (2.14) 
a 2n+1 
+ e= eee (2.15) 
i=0 6 
dx! 4 = , O0<x<l (geometric series) (2.16) 
i= 
Sie, Oe ier (2.17) 
i=0 (1 —x)? 
no. | —xntl 

_ 2.1 

zx ee x#1 (2.18) 
oO Li 
= 7 =e, |x| < +00 (exponential series) (2.19) 
i=0 /: 
n be oi 
Dy (")xiym =(x+y)” (binomial series) (2.20) 
i=0 <2 


Note that in view of (2.6) every probability distribution {po9,p1,...} of a discrete ran- 
dom variable must fulfill the normalizing condition 


yo p=. (2.21) 
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Uniform Distribution A random variable X with range R= {x1, x2,...,xn} has a 
discrete uniform distribution if 


pi =P(X=x;j) 


I. i=1,2,...,n. 


Thus, each possible value has the same probability. The normalizing condition (2.21) 
is obviously fulfilled. Mean value and variance are 


ae | 1 = 
EQ) == q Dixy, Var(X) = q Di 0s-¥?. 
Thus, E(X) is the arithmetic mean of all values which X can assume. 
Particularly, if x; =i for i= 1,2,...,n, then the formulas (2.14) and (2.15) yield 


Bw =" L Var(X) = re (2.22) 


For instance, if X is the outcome of ‘rolling a die’, then R = {1,2,..., 6} and p; = 1/6 
so that 


E(X) = 3.5, and Var(X) ~ 2.92, |Var(X) ~ 1.71, V(X) =0.59 259%, 
and E(|\X- EQ) = £|1-3.5| + |2—3.5| +--+ |6—3.5]) = 1.5 so that 


E(\X- E(X)|) = 1.5 < [Var(X) ~ 1.71. 


Bernoulli Distribution A random variable X with range R = {0, 1} has a Bernoulli 
distribution with parameter p, 0<p< 1, if 


po=P(X=0)=1-p, py =PX=)=p. (2.23) 
Mean value and variance are 
E(X)=p, Var(X)=p(1-p). (2.24) 


This is easily verified: 

E(X)=0-(1—p)+1-p=p 

Var(X) = (0—p)*(1—p)+ (1p)? p =p(-p). 
The random experiment, which leads to the Bernoulli distribution, is called Bernoulli 
trial. It has two outcomes: event A and its complementary event A. Event A occurs 


with probability p, and event 4 occurs with probability | —p. The random variable X 
defined by (2.23) assigns a"1" to event A and a "0" to event A : 


(2.25) 


Ye 1 if A hasoccurred, 
0 if A hasoccurred. 


The occurrence of A is frequently referred to as success. With this terminology, X is 
the indicator variable for the occurrence of a success or a failure, respectively. Gen- 
erally, since X can only assume two values, it is called a (random) binary variable. 
Specifically, since the two possible values of X are 0 and 1, it is a(0, 1)-variable. 
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Geometric Distribution A random variable X with range R = {1,2,...} has a geo- 
metric distribution with parameter p,0<p< 1, if 

pi=P(X=i)=p(—p)"!, i=1,2,.... (2.26) 
In view of the geometrical series (2.21), the normalizing condition (2.26) is fulfilled. 
Mean value and variance are 

E(X)=1/p, Var(X)=(1-p)/p?. 

To verify these formulas, use the series (2.16) and (2.17) as well as formula (2.13). A 
more elegant derivation is given in section 2.5.1. 


For instance, if X is the random integer indicating how frequently one has to toss a 
die to get for the first time a '6' (= success), then X has a geometric distribution with 


p=1/6, E(X)=6, Var(X) = 30, and ,JVar(X) ~ 5.4772. 
Generally, a geometrically distributed random variable X can be interpreted as 


the number of independent Bernoulli trials one has to carry out to have for the 
first time a ‘success’. 


The geometric distribution is also defined with range R = {0, 1,...} and 

pj; =P(X=i)=p(1—p)', i=0,1,2,.... (2.27) 
In this case, mean value and variance are 

E(X)=(1-p)lp, Var(X) =(1—p)/p?. 


Example 2.3 (‘nonaging property' of the geometric distribution) Let X be a geomet- 
rically with parameter p distributed random variable. For any integersm = 0 and n= 1 
determine the conditional probability PLY = m+n|X > m). 


In view of the geometrical series (2.16) with x = 1 —p, 

P(X > m) = in P(L—p)! =p. -p)” Z2o(1-p)' = (1p). 
By the formula of conditional probability (1.22) and since the event "¥=m+n" 
implies the event "X > m", 


POC Aa) 


P(X>m) P(X>m) 
p(l egies = 
= =p(l an 
(1—p) p(i-p) 
Hence, 
P(X=m+n|X>m)=P(X=n), m,n=1,2,.... (2.28) 


This result has an interesting interpretation: If X is the lifetime of a technical unit, 
which can only fail at time points n = 1,2..., and which has survived m time units, 
then the residual lifetime of the unit has the same lifetime distribution as the unit at 
the start of its operation, i.e. as a new unit. Such a unit is called nonaging. O 
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Binomial Distribution A random variable X with range R = {0,1,...,n} has a 
binomial distribution with parameters p and n if 


p= PSS (")p'a-py, jack (2.29) 


Frequently the notation p; = b(i,n,p) is used. 
In view of the binomial series (2.20) with x = p and y = 1 —p, the normalizing condi- 
tion (2.21) is fulfilled. Mean value and variance are 

E(X)=np, Var(X)=np(1—p). (2.30) 
The proofs will be given in section 2.5.1. 
The binomial distribution occurs in the following situation: A Bernoulli trial, whose 
outcome is the (0,1)-indicator variable for the occurrences of events A and A as giv- 
en by (2.25), is independently repeated n times. (Independence in the sense of defini- 
tion 1.5: The respective outcomes of the n Bernoulli are independent random events.) 
Let the outcome of the ith trial be X;: 


ie | 1 if A has occurred, 


re P= 12.357: 
0 if A hasoccurred, eee 


The outcome of a series of n Bernoulli trials is a random vector X = (X1,X9,...,Xn), 
whose components X; can take on values 0 or 1. The sum 


X= De X; 
is equal to the random number of successes in a series of n Bernoulli trials. X has a 
binomial distribution with parameters n and p: In view of the product formula for 


independent events (1.29), the probability that in XY a 'l' occurs i times and a '0' 
occurs ( — i) times in a specific order, is 


pi(l-p)". 
There are C ) different possibilities to order the i'l's and (7 —7) '0's. 
For instance, let n = 3. Then the probability that vector (0,1, 1) (first Bernoulli trial 
is a failure, the second and third trial are successes) occurs is (1 —p)p. But there are 


G) =3 vectors with 1 failure and 2 successes having probability (1 —p)p?: 
(1, 1,0), (1,0, 1), (0,1, 1). 


Hence, the probability that a series of three Bernoulli trials yields one failure and two 
successes is 3p?(1 —p). 


Example 2.4 A power station supplies power to 10 bulk consumers. They use power 
independently of each other and in random time intervals, which, for each customer, 
accumulate to 20% of the calendar time. What is the probability of the random event 
B that at a randomly chosen time point at least seven customers use power? 
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The problem leads to a Bernoulli trial, where the ‘success event' A for every custo- 
mer is 'using power'. By assumption, p = P(A) = 0.2. Let B; be the event that exact- 
ly 7 customers simultaneously use power. Then the event of interest is 


B= BiUBgUBgUB jo. 
The B; are disjoint so that 
P(B) = Dixy PCB) = Dey (1°) 0.2) 0.8)! 
= 7.864 - 10-4 + 7.373 - 10> +. 4.096 - 10-6 + 1.024 - 1077 
= 0.000864. oO 


Example 2.5 From a large delivery of calculators a sample of size n = 100 is taken. 
The delivery will be accepted if there are at most 4 defective calculators in the sam- 
ple. The average rate of defective calculators from the producer is known to be 2%. 


1) What is the probability P,;,, that the delivery will be rejected (producer's risk)? 
2) What is the probability C,;,, to accept the delivery although it contains 7% defec- 
tive calculators (consumer's risk)? 


1) Picking a defective calculator is declared a "success" (event A). The probability of 
this event is P(A) = 0.02. Thus, the underlying Bernoulli trial has parameters p = 0.02 
and n= 100. The probabilities p; that i from 100 calculators are defective are: 
Di= (0) (0.02)! (0.98) !90-7, 7 =0,1,..., 100. 
In particular, 
Po = 9.1326, py = 0.2706, p2 = 0.2734, p3 = 0.1823, p4 = 0.0902 
so that the producer's risk is 


Pyisk = 1-—Po -P1 —P2-P3—P4 = 9.0509. 


2) Now a "success" (picking a defective calculator) has probability p = P(A) = 0.07 
so that the probabilities p; to have i defective calculators in a sample of 100 are 


Di= (1%) (0.07)! (0.93)! 7 =0,1,..., 100. 
In particular, 
Po = 9.0007, p; = 0.0053, pz = 0.0198, p3 = 0.0486, p4 = 0.0888. 
Thus, the consumer's risk is Cy. =po+pP1+p2+p3+p4 =0.1632. Thus, the pro- 
posed acceptance/rejection plan favors the producer. oO 


In examples like the previous one the successive calculation of the probabilities p; 
can be efficiently done by using the following recursion formula: 
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Negative Binomial Distribution A random variable X with range R = {0,1,...} has 
a negative binomial distribution with parameters p and r,0 <p <1, r>0, if 


pi=Pr=i= (I 14") pia py’, Site. (2.31) 
Equivalently, 
p= 2a=)= ) (-p)'(1-p)"; i=0,1,.... 
Mean value and variance are 


BQX)=5, Var(X)= et (2.32) 


If 7 is a positive integer, then X can be interpreted as the total number of trials in a 
series of independent Bernoulli trials till the rth success occurs.The geometric dis- 
tribution is a special case of the negative binomial distribution if 7 = 1. 


The negative binomial distribution is also called Pascal distribution. 


Hypergeometric Distribution A random variable X with range 
R= {0,1,...,min(m, M)} 
has a hypergeometric distribution with parameters M, N, andn, M< N,n<N, if 


eas 


Pm=P(X= eam a ea m=0,1,...,min(n, M). (2.33) 
& 
Mean value and variance are: 
_ M M M n—-1 
E(X) =n WN? Var(X) =n Te Mt) (1 nal), (2.34) 


As an application, consider the lottery '6 out of 49". In this case, M=n=6, N=49, 
and pm is the probability that a gambler hits exactly m winning numbers with one 
coupon (see example 1.7). More generally, hypergeometrically distributed random 
variables occur in the following situations: In a set of N elements belong M elements 
to type 1 and N—M elements to type 2. A sample of n elements is randomly taken 
from this set. What is the probability that there are m elements of type 1 (and, hence, 
n—m elements of type 2) in this sample? 


If X is the random number of type 1 elements in the sample, then X has the distribu- 
tion (2.33): There are eo possibilities to select from M type 1-elements exactly m, 
and to each of these possibilities there are Cy possibilities to select from N— M 
type 2-elements exactly n—m. The product of both numbers is the number of favor- 
able cases for the occurrence of the event 'X = m'. Finally, there are es ) possibilities 


to select n elements from a total of Nelements. Problems of this kind are typical ones 
in statistical quality control. 
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Example 2.6 A customer knows that on average 4% of parts delivered by a manufac- 
turer are defective and has accepted this percentage. To check whether the manufac- 
turer exceeds this limit, the customer takes from each batch of 800 parts randomly a 
sample of size 80 and accepts the delivery if there are at most 3 defective parts in a 
batch. What is the probability that the customer accepts a batch, which contains 50 
defective parts? In this case, 


N=800, M=50, andn=80. 
Let X be the random number of defective parts in the sample. Then the probabilities 


p; =P(X=i) are 
oy ieee 
i 80-i 7. 


Pir (800) 9 
80 
The exact values are 


Po = 9.00431, p, =0.02572, py = 0.07406, p3 = 0.13734. 
Thus, the acceptance probability C,;,, of the delivery (consumer's risk) is 


PSO 23: 


Crisk =P0+P1 +P2+p3 = 0.24143. 
Note that according to agreement the average number of faulty parts in a batch is 
supposed to be 32. oO 


Remark When comparing examples 2.5 and 2.6, the reader will notice that despite the same 
type of problems, for their solution first the binomial disribution and then the hypergeometric 
distribution had been used. This is because in example 2.5 the size of the delivery, from which 
a sample was taken, had been assumed to be large compared to the sample size, whereas in 
example 2.6 the size of the set of parts, namely 50, is fairly small compared to the sample of 
size 5 taken from this lot. If a sample of moderate size is taken from a sufficiently large set of 
parts, then this will not significantly change the ratio between defective and nondefective parts 
in the set, and one can assume the probability p of picking a defective part stays approximate- 
ly the same. In this case the binomial distribution will yield acceptable approximate values. 
But if you want to apply the binomial distribution to small lots of parts, then, after every test 
of a part, you have to return it to the lot. In this case the ratio between defective and nondefec- 
tive parts in the lot will not change either. The policy ‘with replacement' is not always applic- 
able, since during a check a part is frequently 'tested to death'. Generally, when applying the 
binomial distribution (hypergeometric distribution) in quality control, then "sampling with 
replacement" ("sampling without replacement") refers. 


Example 2.7 Let N be the unknown number of adult zebras in a large National Park. 
A number of M@=100 randomly selected zebras from the total population of this 
park had been marked. A year later, a second sample from the whole adult zebra 
population of this park was taken, this time of size n = 50. Amongst these there were 
m=7 zebras marked a year ago. Construct an estimation N for N with property that 
for N=N the probability of the observed event 'X¥ = 7' is maximal. 
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This way of estimating N makes sense, since one does not assume to have observed 
by chance an unlikely event instead of a very likely one. In this case, the hypergeo- 
metrically distributed random variable X is the number of marked zebras in the 
second sample of size n = 50. Let p7(N) = P(X¥=7|N) be the probability that there 
are 7 marked zebras in the second sample given that the whole zebra population is of 
size N. Then, by definition of N , the following two inequalities must be true: 


CNM) CNS 7) 


ps(N+1)= r 7 =p7(N), (2.35) 
ie , ie) 
50 50 
(190) be fh i) (190) be ito) 
; 7 50-7 7 \ 50=7 ; 
pIN-1)= = p7(N). (2.36) 


(3) 
50 
Inequality (2.35) is equivalent to 
ae oy Gs re be i pe '] 
43 50) — 43 50 /° 


By making use of the representation (1.5) of the binomial coefficient (cancelling the 
factors which are equal at both sides), this inequality reduces to 


(N-—99)(N—49) <(N-142)(N+1) or 4993<7N or 713.3<N. 
Inequality (2.36) is equivalent to 
Ge on i) : eo ito) fe ‘ . 
43 50 43 50 
Again by using (1.5), this inequality simplifies to 


(N-143)N<(N-100)(N-50) or 7N<5000 or N<714.3. 


Hence, 713.3 <N<714.3, so that N= 714. Oo 


If the probabilities pm of the hypergeometric distribution have to be successively 
calculated, then the following recursion formula is useful: 
(n —m)(M—-m) 


— SSS SS " = 1 eee i . 
Pmt (m+ 1(N-M-n+m+ pe m 0, ’ ,min(n, M) 
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Poisson Distribution A random variable X with range R= {0,1,...} has a Poisson 
distribution with parameter A if 
i 
pia PX=)=Ke, OPO (2.37) 


In view of the exponential series (2.19), the normalizing condition (2.21) is fulfilled. 
Again by making use of (2.19), 


oO 2 4! = 00 xi = 
EQ=24p = Se" = 2 e 
Ms es Es AG 


= nery at =Aeret* =}, 
i=0 7} 
In section 2.2.3 it will be proved that the variance of X is equal to A as well. Thus, 
E(X) =), Var(X)=2. (2.38) 
In the context of the Poisson distribution, X is frequently said to be the number of 


Poisson events (occurring in time or in a spacial area). 


Example 2.8 Let X be the random number of staff of a company being on sick leave 
a day. Long-term observations have shown that_X has a Poisson distribution with pa- 
rameter A = E(X) = 10. 


What is the probability that the number of staff being on sick leave a day is 9, 10, or 
11? 


9 

pox ae = 0.1251, 

JOE seit 
Plo= Tor =0.1251, 

SRO iG: 
PS ae = 0.1137. 
Hence, the desired probability is 

POSXE11)= pot+pio +p = 9.3639. O 


With regard to applications, it is frequently more adequate to write the Poisson prob- 
abilities (2.37) in the following form: 


i 
p= PX =i) = AY om, A>0, t>0; 1=0,1,.... (2.39) 
Lt 


In this form, the Poisson distribution depends on the two parameters 4 and ¢. The 
parameter ¢ refers to the time span or to the size of a spacial area (1-, 2-, or 3-dimen- 
sional), and A refers to the mean number of Poisson events occurring per unit time, 
per length unit, etc. Thus, ¢ is a scale parameter. 
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Example 2.9 The number of trees per unit of area in a virgin forest stand with a stem 
diameter of at least 50cm (measured at a height of 1.3 m) follows a Poisson distribu- 
tion with parameter A = 0.004 [m2], 


What are the probabilities that in any subarea of 1000 m? in this stand there are 
(1) none of such trees, and (2) exactly four of such trees? 


Formula (2.39) is applied with & = 0.004 [m2]! and t= 1000m2. The results are 
Po= e9-004-1000 = e4 aS 0.0183, 
_ [(0.004) - 1000]* ,-9.004-1000 
ae 4I . 


4 
= ae ~ 0.1954. Oo 


If the Poisson probabilities' p; have to be manually calculated, then the following 
recursion formula is useful: 
x : 
Pil 7417? i=0,1,... 
Approximations In view of binomial coefficients involved in the definition of the 


binomial and particularly in the hypergeometric distribution, the following approxi- 
mations are useful for numerical analysis with a calculator: 


Poisson Approximation to the Binomial Distribution If n is sufficiently large and p 
is sufficiently small, then 
- 41 
(")pia —pyind or; X=np, n=0,1,.... (2.40) 
i i} 


As atule of thumb, the Poisson approximation is applicable if 


np < 10 and n> 1500p. 
Binomial Approximation to the Hypergeometric Distribution 
M\(N-M 
) ( n-m ) 
(n) 
n 


As arule of thumb, the binomial approximation to the hypergeometric distribution is 
applicable if 


~(7)p™ py with p=MIN; m=0,1,.6n. (241) 


0.1<M/N<0.9, n>10, and n/N <0.05. 


This approximation is heuristically motivated by the remark after example 2.6. 
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Poisson Approximation to the Hypergeometric Distribution If n is sufficiently large 
and p = MIN is sufficiently small, then 


M\(N-M 
in) Gin) we ee withaan- ™, (2.42) 


(*) “ml! N 
n 
This relation combines the approximations (2.40) and (2.41). As a rule of thumb, the 
Poisson approximation is applicable if 
MIN $0.1, n> 30, n/N <0.05. 


Example 2.10 On average, only 0.01% of trout eggs will develop into adult fish. 
What is the probability ps3 that at least three adult fish arise from 40 000 eggs? 


Let X be the random number of eggs out of 40 000 which develop into adult fish. It 
is assumed that the eggs develop independently of each other. Then XY has a binomial 
distribution with parameters n = 40 000 and p = 0.0001. Thus, 


pi=P(X=i= ( om) (0.0001)! (0.9999) 400001, 
where i = 0,1,...,40000. Since x is large and p is small, the Poisson distribution with 
parameter 4 = np = 4 can be used to approximately calculate the p;: 
i 
Pi= ae i=0,1,.... 


The desired probability is 
P>3 = 1-po-P1 —P2 = 1 — 0.0183 — 0.0733 — 0.1465 


= 0.7619. oO 


Continuation of Example 2.6 The binomial and the Poisson approximations to the 
hypergeometric distribution are applied with 

N=800, M=50, and n = 80. 
Table 2.2 compares the exact values to the ones obtained from approximations. The 
third condition in the corresponding ‘rule of thumbs', namely n/N < 0.05, is not ful- 
filled. Oo 


Po Pi P2 P3 Crisk 


Exact 0.00431 | 0.02572 | 0.07406 | 0.13734 | 0.24143 
Binomial | 0.00572 | 0.03053 | 0.08039 | 0.13934 | 0.25598 
Poisson 0.00673 | 0.03369 | 0.08422 | 0.14037 | 0.26501 


Table 2.2 Comparison of exact probabilities to its approximative values (example 2.6) 


2 ONE-DIMENSIONAL RANDOM VARIABLES 59 


2.3 CONTINUOUS RANDOM VARIABLES 


2.3.1 Probability Distribution 


The probability distribution of a discrete random variable Y is given by assigning to 
each possible value of Y its probability according to the probability mass function of 
Y. This approach is no longer feasible for random variables, which can assume non- 
countably many values. To illustrate the situation, let us recall the geometric distribu- 
tion over the interval [0,7] (page 15). This distribution defines the probability distri- 
bution of a random variable X with noncountable, but finite, range R = [0,7] in the 
following way: The probability that XY takes on a value out of an interval [a,b] with 
O0<a<b<T<o is 

P(iasX<b)=(b-a)/T. 
If 5 — a, then length of this 'interval probability' tends to 0: P(X = a) = 0. However, 
to assign to each value of X the probability 0 cannot be the way to define the probab- 
ility distribution of a random variable with noncountably many values. Moreover, a 
noncountable range R does not exclude the possibility that there exists a finite or 
countably infinite set of values of X which actually have positive probabilities. Hence, 
the probability distribution of X will be defined via the distribution function of X 
(definition 2.2) as suggested in section 2.1: 

F(x)=P(X¥Sx), xeR. (2.43) 

As shown there (formula 2.5), the interval probabilities for any interval / = [a, b] with 
a<b anda,b € R are given in terms of the distribution function by 

P(a<X<b)=F(b)- F(a). (2.44) 


To exclude the case that F(x) has jumps for some x € R (i.e. F(x) has points of dis- 
continuity), a continuous random variable is defined as follows: 


A random variable is called continuous if its distribution function F(x) has a first 
derivative f(x) = F’(x). 


Equivalently, a random variable is called continuous if there is a function f(x) so that 
F(x) =], fw) du. (2.45) 

The function 
f(x) = F'(x) = dF(x)/dx, x eR, (2.46) 


is called probability density function, probability density, or briefly density of X. 
Sometimes the term probability mass function 1s used. A density has properties 


FSO | FOE: (2.47) 


Conversely, every function f(x) with properties (2.47) can be interpreted as the den- 
sity of a continuous random variable (Figure 2.4). 
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Figure 2.4 Relationship between distribution function and density 


Note If a random variable X has a density f(x), then its distribution function need not exist in 
an explicit form. This is the case if f(x) is not integrable. Then, if no tables are available, the 
values of F(x) have to be calculated by numerical integration of (2.45). 


The range of X coincides with the set of all those x for which its density is positive: 
R= {x, f(x) > 0} (Figure 2.4). In terms of the density, the interval probability (2.44) 
has the form 


P(a<X<b)=[" fxdx. (2.48) 
Thus, the probability that XY assumes a value between a and b is equal to the area be- 
low f(x) and above the x-axis between a and b (Figure 2.4). This implies the larger 


f(X) is in an environment of x, the larger is the probability that XY assumes a value out 
of this environment. 


Example 2.11 A popular example for a continuous probability distribution is the 
exponential distribution with parameter 2: It has distribution function and density 
(see Figure 2.5 a) and b)) 


l-e**, x>0 ne**, x>0 
= 9 2 = ? , 2.4 
Fo) 1. ec. | EO? i; x<0. Oe) 


A random variable with this distribution cannot take on negative values since 
F(0)=P(X< 0) =0. 


By (2.44), if A=1, a=1, and b=2, the probability that X takes on a value between 
land 2 is P(1 <X<2)=F(2)-F(1) =(1-e7*) -(1-e7!) = 0.2325. O 


b) 
>x 


Figure 2.5 Distribution function a) and density b) of the exponential distribution 
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A motivation of the term ‘probability density' follows from the definition of f(x) as 
. F&+Ax)—- F(x) 
i 
flx)= Jim, 
so that, for small Av, 


f(x) ® 


Hence, f(x) is indeed a probability per unit of x, and f(x)Ax is approximately the prob- 
ability that XY takes on a value in the interval [x,x+ Ax]. This is the reason why for 
some heuristic derivations it is useful to interpret f(x)dx as the probability that X 
takes on value x. Of course, for continuous random variables this probability is 0: 


P(X =x)= peo Ea) = F(x) - F(x) = 0. 


Firs A) — Fe) or f(x)Ax = Fle + Ax) — F(x). (2.50) 


Example 2.12 The weights of 60 balls for ball bearings of the same type have been 
measured. Normally, one would expect that all balls have the same weight as prescri- 
bed by the standard for this type of ball bearings. In view of unavoidable technolog- 
ical fluctuations and measurement errors, this is not a realistic expectation. Table 2.3 
shows the results of the measurements [in g]: 


5.77 | 5.82 | 5.70 | 5.78 | 5.70 | 5.62 | 5.66 | 5.66 | 5.64 | 5.76 
5.73 | 5.80 | 5.76 | 5.76 | 5.68 | 5.66 | 5.62 | 5.72 | 5.70 | 5.78 
5.76 | 5.67 | 5.70 | 5.72 | 5.81 | 5.79 | 5.78 | 5.66 | 5.76 | 5.72 
5.70 | 5.78 | 5.76 | 5.70 | 5.76 | 5.76 | 5.62 | 5.68 | 5.74 | 5.74 
5.81 | 5.66 | 5.72 | 5.74 | 5.64 | 5.79 | 5.72 | 5.82 | 5.74 | 5.73 
5.81 | 5.77 | 5.60 | 5.72 | 5.78 | 5.76 | 5.74 | 5.70 | 5.64 | 5.78 


Table 2.3 Sample of 60 weight measurements of balls for ball bearings of the same type 


The data fluctuate between 5.60 and 5.82. This interval is called the range of the 
sample. Of course, the weights of the balls can principally assume any value within 
the range, but the accuracy of the measurement method applied is restricted to two 
decimals after the point. To get an idea on the frequency distribution of the data, they 
are partitioned into class intervals (or cells). In Table 2.4, the integer n; denotes the 
number of measurements which belong to class i, and p;=n;/n with n= 60 is the 
relative frequency of the random event A; = 'a measurement is in class interval i'. A 
ball is randomly selected from the data set. Let X be the number of the class which 
the weight of this ball belongs to. Then X is a discrete random variable with range 
{1, 2,..., 6} and probability distribution 
pi=P(X=i)=nj/n, i=1,2,...,6. 


The corresponding cumulative probabilities s; are 


S;=Pitpot-::tp;, i=1,2,...,6, 56 =1. 
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xX class n; D; S; 

1 | [5.59-5.63) 4 | 0.0667 | 0.0667 

2 | [5.63-5.67) 8 | 0.1333 | 0.2000 

3 | [5.67-5.71) 10 | 0.1667 | 0.3667 

4 | [5.71-5.75) 13° | 0.2167 | 0.5834 

5 | [5.75-5.79) 17 | 0.2833 | 0.8667 

6 | [5.79-5.83) 0.1333 | 1 


Table 2.4 Probability distribution of X for example 2.12 


Now we essentially are in the same situation as in example 2.3. In Table 2.4 the nota- 
tion [a;,a;,;) means that the left end point a; belongs to the class interval, but the 


right end point a;,; does not. 


A 
F6o(x) A 
Mie acu ee ee ee Pi 
Dae eee pee fret ot acl tt di fe DE ee det eel ED peas fades EE doc S 
0.8F > 
0.6/- te tee sb et ee ee ote S4 
Oa OA ap 53 a) 
0.2 --------- So 
iS | N Ly» a L L L \ \ ! > 
(0) 5.59 5.63 5.67 5.71 5.75 5.79 5.83 % 0 5.59 5.63 5.67 5.71 5.75 5.79 5.83 x 


Figure 2.6 Distribution function a) and probability histogram b) of X (example 2.12) 


The jump size of the distribution function between the ith and the (i+ 1) th class is 
determined by the data belonging to the / th class, 1.e., by the probabilities p; =n,/n : 


F6o(x) = P(X Sx) = 


0 


for x < 5.63 


0.0667 for 5.63 <x < 5.67 
0.2000 for 5.67 <x< 5.71 
0.3667 for 5.71 <x < 5.75 
0.5834 for 5.75 <x < 5.79 
0.8667 for 5.79 <x < 5.83 
1 for 5.83 <x 


The histogram is an approximation to the probability density of the random weight Y 
of the balls, which actually is a continuous random variable, for the following reason: 
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If the length of the class intervals is scheduled to be one, what can always be done by 
scaling the x-axis accordingly (see Figure 2.10, page 70), then the area of the column 
over this interval is the probability p; =1;/n that Y takes on a value from this interval. 
This corresponds to the interval probabilities (2.48) given by a density. By comparing 
the probability histogram with the theoretical densities proposed in section 2.3.4, one 
gets a first hint at the type of the probability distribution of Y. For instance, when com- 
paring the histogram (Figure 2.6 b) with the density of the exponential distribution 
(Figure 2.5 b), this distribution can immediately be excluded as a suitable model. O 


By partitioning in the previous example the 60 ball weights in classes, information 
about the probability distribution of the ball weights was lost. No information is lost 
when defining an empirical distribution function Fn(x) of Y based on a sample of 
size n (1.e., the results of n repetitions of a random experiment with outcome Y have 
been registered) as follows: 


Fa(ay= 2, 
where n(x) is the number of values in the sample, which are equal or smaller than x. 


Theorem of I. V. Glivenko: F(x) tends to F(x) = P(Y< x) as n> ~ in the follow- 
ing sense: If Gy = supyeR |Fn(x) — F(x)|, where R is the range of Y, then 
PClim Gn =O)=1. 


Note that Fn(x) has jumps of size 1/n at each sample value. 


2.3.2 Distribution Parameters 


The probability distribution function and/or the density of a continuous random vari- 
able X contain all the information on X. But, as with discrete random variables, to get 
fast information on essential features of a random variable or its probability distribu- 
tion, it is desirable to condense as much as possible of this information into some nu- 
merical parameters. Their interpretation is the same as for discrete random variables. 
Remember that a random variable X can be interpreted as the outcome of a random 
experiment. The mean value gives information on the average outcome of the random 
experiment in a long series of repetitions. The characteristic feature of the median is 
that, in a long series of repetitions of the random experiment, on average 50% of its 
outcomes are to the left of the median and 50% to the right. Hence, mean value and 
median characterize the central tendency of X . 


Mean Value The mean value (mean, expected value) of X 1s defined as 
E(X) = [Px f(x dx (2.51) 


on condition that aes |x| f(x) dx <0, 
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The condition makes sure that the integral (2.51) exists. Note that 
EX) = [7 bel fe) de. 


Formula (2.51) can be derived from the definition of the mean value of a discrete ran- 
dom variable (2.7): For simplicity of notation, let the range of X be R =[0,0). R is 
partitioned in intervals J; of length Ax as follows: 


1, = (kA, (k+ DA], k= 0, 1,.... 


Let X be a discrete random variable, which takes on a value x; from each J; with 
probability p, = F((k+ 1)A)— F(kA); k= 0, 1,... Then, by (2.7) and (2.50), as A > 0, 
o 20 i) k+1)A 
E(X) = Xp-0 XkPk = Uk=0 ( 4 xp fx) dx 
> Jp xf) dx = EX). 
For nonnegative continuous random variables, the analogue to formula (2.10) is 


E(X) = [oH -F@)l de. (2.52) 


This formula is verified by partial integration as follows: 
+00 : t 
E(X) = Jp" x fe) de = Jim [ox f(a) dx 
; t _ tt 
= lim [ FJ, F@)ax |= lim [FF @)] de 
= Jp [1 -F@)] dx. 


From (2.51) one gets analogously by partial integration the mean value of a random 
variable X with range R = (—co, +00) as 


BX) = Jel -F@dx— f°), FQ) de. 


If h(x) is a real function and X any continuous random variable with density f(x), 
then the mean value of the random variable Y= /(X) can directly be obtained from 
the density of X: 


E(h(X)) = |" AQ) fx) a. (2.53) 
If h(x) = ax+b5 with constants a and b, then Y= aX+b and 
E(aX + b) = aE(X) +5. (2.54) 


If both X and h(x) are nonnegative, one obtains by partial integration of (2.53) a for- 
mula for E(A(X)), which generalizes formula (2.52): 


E(h(X)) = 5 [1 — FQ) dh(x) = [5 11 — Fah") dx, (2.55) 


where h/(x) denotes the first derivative of h(x) (assuming its existence). 
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Moments By specifying h(x), formula (2.53) yields the moments of X: 
The (ordinary) mth moment of X is the mean value of X”: 

aE ys | et @aeer=O1x (2.56) 
In particular, up = 1 and wp; = F(X). 


The nth (ordinary) central moment of X is 


mn = E((X- E(X))") = |" (x - B(X))"f@) de, 1 =0,1,..., (2.57) 
and the n th absolute central moment of X is 
Mn = E(\X- E(X)|") =|"? |x- E(X)|"fQ dx, n =0,1,.... (2.58) 


Median The median of a continuous random variable X with distribution function 
F(x) is defined as that value x9.5 of X which satisfies F(x9.5) = 0.5. 


Hence, in a long series of experiments with outcome X about 50% of the results will 
be to the left of xg.5 and 50% to the right of x95 (Figure 2.7). One may expect that 


x0.5 = E(X). But this is not generally true as the following example shows. 
Example 2.13 Let_X have an exponential distribution with parameter i (see example 
2.11), ie., F(x) = 1-e**, x > 0. Then, by formula (2.52), 
E(X) =| ede = Vn. 
Now, let A(x) =x. Then, by (2.55), the second moment of X becomes 
E(X?) = is e**Ixdx =2 le xe dy = ~Z [eax + De 
= S10 ~1]=2/r2, 


The median x9 5 is solution of the equation 1 —e~**05 = 0.5 so that 
x9.5 =0.6931/2. 
Thus, for the exponential distribution, x9 5 < E(X) and E(X*) > [E(X)]?. Oo 


Percentile The a-percentile (also denoted as a-quantile) of a continuous random 
variable_X is defined as that value xq of X which satisfies 


F(xa) =a, 0<a<l. (2.59) 
Hence, in a long series of experiments with outcome _X, about 0% of the results will 
be to the left of xg and (1 — a)% to the right of xq (Figure 2.7). Thus, the median is 
the 0.5-percentile of X or of its probability distribution, respectively. 
Percentiles are important criteria in quality control. For instance, for an exponentially 
distributed lifetime, what should the mean life of an electronic part be so that 95% of 


these parts operate at least 5 years without failure? The mean life is p= 1/A so that 
must satisfy P(X > 5) = e~>/# > 0.95. Therefore, 1 > 97.5 years. 
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A 
F(x) 
1 
0.5 a 
Y > x 


median *o 
Figure 2.7 Illustration of the percentiles 


Mode A mode xm of a continuous random variable XY with density f(x) is a value at 
which f(x) assumes a relative maximum. f(x) is unimodal if it has exactly one mode. 
Otherwise it is called multimodal. 

A density may have an uncountably infinite set of modes. This happens when the 
density takes on a (relative) maximum over a whole interval. For a unimodal density 
(in this case f(x) assumes its absolute maximum at x), the most outcomes during a 
long series of experiments will be in an environment of xm. 


A function f(x) is said to be symmetric with symmetry center xs if for all x 
S(%s —x) =f(Xs +x). 


It is quite obvious that for a random variable X with a unimodal and symmetric prob- 
ability density f(x), median, mode and symmetry center coincide. If, in addition, the 
mean value of X is finite, then 


E(X) =x0.5 =Xm =Xs. 


A 
fo]. 


Figure 2.8 Density of the Laplace distribution 


Example 2.14 The Laplace distribution, also called doubly exponential distribution, 
has a symmetric density with symmetry center at x; = u (Figure 2.8): 


f@)=zre*bal, -o<x<o. 
This density assumes its maximum at xm =p, namely f(y) = A/2. Thus, 
E(X) = x05 =Xm =p. oO 
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In what follows, formulas for the measures of variability, introduced in section 2.2 
for discrete random variables, are given for continuous random variables. Their inter- 
pretation does not change. 


Variance The variance of X is the mean value of the squared deviation of X from its 
mean value E(X), i.e. the mean value of the random variable Y = (X— E(X ))? : 


Var(X) = E(X- E(X))’. 
The calculation of this mean value does not require knowledge of the density of Y, 
but can be done by (2.53) with h(x) = (x -— E(X))?: 


Var(X) = |" (@- E(X))? fla) de. (2.60) 


Thus, the variance of X is its 2nd central moment (equation 2.57). 
If with constants a and b the random variable aX + b is of interest, then h(x) becomes 


h(x) = (ax + b — aE(X) — b)? = a?(x— E(X))” 


so that 
Var(aX + b) = a? Var(X). (2.61) 
There is an important relationship between the variance and the second moment of X: 
Var(X) = E(X*) - [E(X)]?. (2.62) 


The proof is identical to the one for the corresponding relationship for discrete ran- 
dom variables (see formula 2.17). 


Standard Deviation The standard deviation of X is the square root of Var(X). It is 


frequently denoted as o: 
o= JVar(X). 


Coefficient of Variation The coefficient of variation of X is defined as the ratio 
V(X) = 0 /E(X). 


It follows from formulas (2.54) and (2.61) that X and aX have the same coefficient of 
variation. More generally, since the coefficient of variation considers the values of X 
in relation to their average size, this coefficient allows to compare the variability of 
different random variables. 


An important measure of the variability is also the mean absolute linear deviation of 
X from its mean value: 


E(|\x- E(X)|) = [7 |x - BCX) | £0) de. (2.63) 


This is the 1st absolute central moment of X as defined by (2.58): 
M =E(|x-E(X))). 
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Figure 2.9 Distribution function and density for example 2.15 


Example 2.15 Let X be the random emission of SO, [in 100 kg/h] of a chemical fac- 
tory. Its distribution function F(x) (density f(x)) over one day, starting at midnight, 
has been found to be (Figure 2.9) 


0. for x<0, 0 for x <0, 
F(x)=4 Jx for 0<x<1, f@=4 05x%°° for 0<x <1, 
1 for l<x. 0 for l<x. 


The graph of the density shows that the bulk of the (illegal) emissions occurs imme- 
diately after midnight. Later the emissions tend to the accepted values. 


By (2.52), the mean value of X is 
E(X) = J j(1- VE de = [x- 3-037] = 13 [100 Kg/A]. 
This result and formulas (2.56) and (2.62) yield the second moment and the variance: 
E(X?) = J x20.5x7¥2 dx = 0.5) x! dx =0.2, 
o? = Var(X) = 0.2 - (1/3)? = 0.0889. 
Standard deviation and coefficient of variation are 
o= [Var(X) ~0.2981, V(X) =o/E(X) = 0.8943 = 89,43%. 
The Ist absolute central moment of X is 
My = E(\X- 1/3) = J) [x - 1/3]0.5x- dx 


_ 1B 
~Jo 


so that E(|X— 1/3]) = 0.2566 [100 kg/h]. Oo 


(1/3 —x)0.5x-95dx + [) (x — 1/3) 0.5x-95dx = 0.1283 +0.1283 
1/3 


Continuation of Example 2.12 a) The probabilities p; in example 2.12 are actually 
assigned to the class numbers 1, 2, ...,6. To be able to get quantitative information on 
the ball weights, now the p; are assigned to the middle points of the class intervals. 
That means the original range of X, namely {1, 2, ... 6}, is replaced with the range 
{5.605, 5.645, 5.68.5, 5.725, 5.765, 5.805}. The choice of the middle points takes 
into account that the classes do not contain their upper limit. 
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In this way, a discrete random variable has been generated, which approximates the 
original continuous one, the weight of the balls. Mean value and variance of X are 


E(X) = 5.605 - 0.0667 + 5.645 - 0.1333 + 5.685 - 0.1667 + 5.725 - 0.2167, 


and 
Var(X) = (5.605 — 5.722)? - 0.0667 + (5.645 — 5.722)? - 0.1333 
+ (5.685 — 5.722)? - 0.1667 + (5.725 — 5.722)? - 0.2167 
+ (5.765 — 5.722)? - 0.2833 + (5.805 — 5.722)? - 0.1333 
so that 


E(X) = 5.722, Var(X) = 0.00343, Var(X) = 0.05857. 
For the sake of comparison, the first absolute central moment is calculated: 
E(|X— E(X)|) = |5.605 — 5.722| - 0.0667 + |5.645 — 5.722| - 0.1333 
+ |5.685 — 5.722| - 0.1667 + |5.725 — 5.722| - 0.2167 
+ |5.765 — 5.722| - 0.2833 + |5.805 — 5.722] - 0.1333 
= 0.0481. 


By representing several values of the original data set by their average value, the 
numerical effort is reduced, but some of the information contained in the data set is 
lost. Based on the data set given, maximal information on the mean value and on the 
variance of X give the arithmetic mean X and the empirical variance s”, respectively, 
which are calculated from the individual n = 60 values provided by Table 2.2: 


= ly” 2 1 n =\2 1 n 2 n_=2 
X= 5 Dj-1 4; and s = 55 Lie i -*) = eit; Sy : (2.64) 


The numerical results are, including the empirical standard deviation s = fs? : 
¥ = 5.727, s? = 0.0032, and s = 0.0566. 
Directly from the data set, the empirical mean absolute deviation is given by 


Lys bei —¥| = Der be; —5.727| = 0.0475. 


b) The frequency histogram of Figure 2.5 suggests a suitable empirical density /69() 
with respect to class intervals of length 1: 
0 if y< 2/3, 
ax (Gy-2) if 23<y<55, 
A(-3y+22) if 55<y<22/3, 
0 if 22/3 <y. 


Soo) = 


Having assigned length | to all class intervals formally means that the variables x and 
y in Figure 2.10 are related by the linear transformation y = 25x — 138.75, or, in terms 
of the corresponding variables Y and_X: 
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in fool) 


5.59 5.63 5.67 5.71 5.75 | 5.79 5.83) 
1 


| 
2 2 3 4 5 5.5 6 [22 fy, 
3 3 


Figure 2.10 Probability histogram and empirical density for example 2.12 


Y=25X— 138.75 or X=0.04Y+5.55. (2.65) 


First of all, it has to be shown that f¢0(v) is indeed a probability density, i.e,. it has 
to be shown that the area A of the triangle is equal to 1: Since it is composed of 2 
rectangular triangles, there is no need for integration: 

A =40.3-(5.5-2/3)+ 40.3 -(22/3-5.5)=1. 
This empirical density allows the calculation of estimates for the distribution para- 
meters by the formulas given in this section. 
The mean value of Y is 


22/3 3 S25) P 22/3 
EY) = | yfooW)dy= 745 | vGv-2)dv+& J v3y+22)ay 
2/3 2/3 5.5 
_ 3 3 275.5 3 3 7222/3 
= Tel ~y bates + lly i= 
= 4.4965. 


By formulas (2.54) and (2.65), E(X) = 0.04 E(Y) + 5.55 so that 
E(X) = 0.04 - 4.4965 + 5.55 = 5.729. 
By formula (2.60), an estimate of the variance of Y is 


22/3 


Var(Y)= | y?feo()dy- LE? 
2/3 


5.5 22/3 
3 3 
2 vs J y? By-2)dy+ J y? (-3y + 22)dy — [4.4965]? 


5.5 22/3 
_ 3 3( 3 2 Bly, Sift. Be ,0y 22° Lz 2 
=)» (3y -3) | +3 (-3y+ 3 Dies [4.4909] 


= 2.0083. 
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Hence, by formulas (2.59) and (2.60), 


Var(X) = 0.042 Var(Y) = 0.003213. 


By (2.63), the mean absolute linear deviation of Y from E(Y) is 


22/3 
E(\Y-E()|) = Jj, lv 4.4965] foo 0) dv 
4.4965 


3 3 75.5 
= Joi, (4.4965 -—y) By -2) dy + Be [Pao g5(¥ — 4.4965) By — 2) dy 


3 22/3 
+ & [52 (y-4.4965) (—3y + 22) dy 
= 0.58111 +.0.14060 + 0.44402 = 1.16573. 
Hence, E(|X — E(X)|) = 0.04E(|Y— E(Y)|) = 0.04663. Oo 


Truncation Most of the probability distributions for random variables have ranges 
[0,00) or (—co,+00), respectively. If, however, in view of whatever reasons a random 
variable, which is supposed to have distribution function F(x), can only take on values 
from an interval [c, d], then a truncation of the range of X or its distribution, respec- 
tively, makes sense. This is being done by replacing F(x) = P(X < x) with the condi- 
tional distribution function Fy ,.q}(x) = F(X S$ x|c <X <d). By formula (1.22), 


0 if x<e, 


F(x)-F( : 
Fted\) = Toe if c<x<d, (2.66) 


1 if d<x. 


For instance, when the exponential distribution (example 2.10) is truncated with re- 
gard to the interval [c,d], then 


0 if x<c, 
hy 
Fied\@) = eee iP cea dg, (2.67) 
1 if d<x. 


Most important is the special case c = 0. Then, 


0 if x<0, 
Ieee « Bae 
Fi0,d]) = ea if O<x<d, (2.68) 
1 if d<x. 


Truncation is actually a very adequate tool to tailor probability distributions to the 
respective application. Although, as mentioned above, most of the common probabi- 
lity distributions have unbounded ranges (at least to the right), unbounded random 
variables are unrealistic (even impossible) outcomes of random experiments like 
determining life-, repair-, and service times or measurement errors. 
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Standardization A random variable S (discrete or continuous) with 
E(S) =0 and Var(S) = 1 


is called a standard random variable. 


In view of formulas (2.54) and (2.59), for any random variable X with finite mean 
value 1 = E(X) and variance o” = Var(X), the linear transformation of X given by 
X-u 

S=—3- (2.69) 
or, equivalently, by 

_1 Ll 

S= aos oO 

is a standard random variable. S is called the standardization or normalization of X. 


Skewness In case of a continuous random variable, its distribution is symmetric if 
and only if its density is a symmetric function. The skewness of a distribution meas- 
ures the degree of asymmetry of arbitrary probability distributions, including discrete 
ones. (Remember the skewness of a discrete probability distribution is visualized by 
its histogram.) The two most popular skewness criterions are Charlier's skewness yc 
and Pearson's skewness yp: 


where LU, ™3,Xm, and o are in this order mean value, third central moment (see 
formula 2.57), mode, and standard deviation of X. For symmetric distributions both 
criteria are equal to 0. They are negative if the density is skewed to the right (‘long 
tail' of the density to the right (Figure 2.11)) and positive if the density is skewed to 
the left (‘long tail’ of the density to the left). 

Charlier's skewness is invariant to the linear transformation (2.69), i,.e., invariant to 
standardization. That means, X and its standardization (X— E(X))/o have the same 
skewness if measured by yc. 


A mode 
f(x) 


>x 


Figure 2.11 Asymmetric density skewed to the right 
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2.3.3 Important Continuous Probability Distributions 


In this section some important probability distributions of continuous random varia- 
bles X will be listed. If the distribution function is not explicitely given, it can only be 
represented as integral over the density. 


Note: In what follows, the areas where the distribution function is 0 or 1 or, equivalently, the 
density is 0, are no longer explicitely taken into account when specifying the domains of defi- 
nition of these functions. 


Fey Fo) 


a) b) 


0c i oe 0. ¢ ae 


Figure 2.12 Distribution function a) and density b) for the uniform distribution 


Uniform Distribution A random variable X has a uniform distribution over the finite 


interval (range) [c,d] with c < d if it has distribution function and density 
= RSE <x< = let <>< 
FQ) S7 x @S2Sd JG)oa i esasd 


Thus, for any subinterval [a, b] of [c,d], the corresponding interval probability is 


Paa<X<b)= ae 


This probability depends only on the length of the interval [a, 5], but not on its posi- 
tion within the interval [c,d], 1.e., all subintervals of [c,d] of the same length have 
the same chance that_X takes on a value out of it. 


Mean value and variance of X are 


E(X) = etd Var(X) = +5 d-o). 


Power Distribution A random variable X has a power distribution with finite range 
[0,7] if it has distribution function and density (Figure 2.13) 


Fw)=(4)°, foy=%(2)", as 0,450) 023ee 


Mean value and variance are 


2 
E(X)=—*, Var(X) = ——*%*—_.,__ a0, t> 0. 
eres (a+ 1)?(a +2) 


The uniform distribution with range [0,t] is seen to be a special case if a = 1. 
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Fx) 


> x 


Figure 2.13 Density of the power distribution 
Note tis a scale parameter, i.e., without loss of generality t= 1 can be chosen as meas- 
urement unit. a is the shape or form parameter of this distribution, since a determines the 


shape of the graph of the density. 


Pareto Distribution A random variable X has a Pareto distribution with range [t, ©) 
if it has distribution function and density 


F(x) =1-()", foy=% (2), oon 0; WSO: 


Mean value and variance are 


2 
EQ) == », a>, OO aaa 


For a < 1 and a <2 mean value and variance, respectively, do not exist, 1.e., they are 
not finite. 


f(x) 


i} 
i 
I 
i} 
i} 
0. tT za 


Figure 2.14 Density of the Pareto distribution 


Cauchy Distribution A random variable X has a Cauchy distribution with parame- 
ters A and u if it has density 


x 
(x) = —————,, -»<x<m, A>0, -w<p<o. 
PO" Tew 


This distribution is symmetric with symmetry center ». Mean value and variance are 
infinite. 
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>x 
Figure 2.15 Density of the Cauchy distribution 


Exponential Distribution A random variable X has an exponential distribution with 
(scale ) parameter A if it has distribution function and density (Figure 2.5, page 60) 


F(x) =1-e**, f(x) =re**, A>0, x20. (2.70) 
Mean value and variance are 
E(X)= 1/4, Var(X) = 1/22. (2.71) 


Erlang Distribution A random variable X has an Erlang distribution with para- 
meters 2, and n if it has distribution function and density 


7 ae (2.72) 
i=n 


F(x)=1 —ge-Ax b> LOE ewhx y (Ax)! 


(Ax)?! AX. 
fs) =. PTE ob 


Mean value and variance are 

E(X)=nlh, Var(X)=n/d2. 
The exponential distribution is a special case of the Erlang distribution for n = 1. The 
relationship between the Erlang distribution and the Poisson distribution with para- 


meter A is obvious, since the right-hand side of (2.72) is the probability that at least n 
Poisson events occur in the interval [0,x] (formula (2.39), page 56). 


x20, 2>0, n=1,2,... (2.73) 


Gamma Distribution A random variable X has a gamma distribution with parame- 
ters a and £ if it has density (Figure 2.16) 


pe Ou —Bx 


SO) = Fp “le Bx, x>0, a>0, B>0, (2.74) 
where the gammafunction T'(y) is defined by 
Tiy=Jp eo letdt, y>0. (2.75) 
Mean value, variance, mode and Charlier's skewness are 
E(X)=a/B, Var(X) = 0/82, xm =(a-1)/B, yo=2/ Ja. (2.76) 


Special cases: Exponential distribution for a= 1 and B =A, Erlang distribution for 
a=n and B=A. 
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fc) FQ) 0<a<l 


a>l 


> x 
0 Xm ma 0 


Figure 2.16 Densities of the gamma distribution 


Beta Distribution A random variable X has a beta distribution with range (c,d) and 
parameters o and f if it has density 


(d-o' +6 
TO = Fee B) 


where the beta function B(x,y) is defined as 


Ba, B) = J x°- "1 —x) Bld. 


(x c)*ld—x) PI, c<x<d,a>0, B>0, 


An equivalent representation of the beta function is 


P@To). 


B(x, y) = Tee x>0, y>0. 
Mean value and variance are 
uw 
ROatG=o-" 4, tae 
a+Bp (a+B)*(a+P+1) 


The mode of this distribution is 


a-l 
= > > 
Xm =ct+(d 2 ae fora 2>1,B21,anda+P>2. 
A special case is the uniform distribution in [c,d] ifa=B=1. 
If X has a beta distribution on the interval (c,d), then Y= (X—c)/(d-c) has a beta 
distribution on the interval (0, 1). Hence, it is sufficient to consider the beta distribu- 
tion with range (0,1). The corresponding density is 


f= gap" x)P1, O<x<1,a>0, B>0. 
f(x) a=2,p=3 F@) a=1/2,p=1 
0 : 7 >x 0 7 >x 


Figure 2.17 Densities of the beta-distribution over (0, 1) 
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Weibull Distribution A random variable X has a Weibull distribution with scale 
parameter 9 and shape parameter B (2-parameter Weibull distribution) if it has 
distribution function and density (Figure 2.18) 


B-1 
FQ) =1-e@)", qa) = (x) cP; x50, B>0,0>0. (2.77) 
Mean value and variance are 
2 
= a ay 2 2 lt 
ey =0r(4+1), Var(X) =9 [r(3+1) (r(d+1)) | (2.78) 


Special cases: Exponential distribution if 8 = 1/A and B = 1. Rayleigh distribution if 
6B =2. Distribution function, density, and parameters of the Rayleigh distribution are 


F(x) =1-e f(xy = ee > x>0,0>0. (2.79) 
E(X)=0J/72/4, Var(X) = 07 (1-7/4). (2.80) 
A 

f(x) 
B<1l 
Pt B>1 
> x 

0 


Figure 2.18 Densities of the Weibull distribution 


3-parameter Weibull distribution A random variable X has a 3-parameter Weibull 
distribution with parameters a, 6, and 0 if it has distribution function and density 


0 for x<a, 
F(x) = (x-a\? 
en 0 for a<x, 
0 for x <a, 
fx) = 2 Bp eee 
Baca) e Se, for a<x 


a is a parameter of location, since X cannot assume values smaller than a. 
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Remark The Weibull distribution was found by the German mining engineers E. Rosin and E. 
Rammier in the late twenties of the past century when investigating the distribution of the size 
of stone, coal, and other particles after a grinding process (see, for example, Rosin, Rammler 
(1931)). Hence, in the mining engineering literature, the Weibull distribution is called Rosin- 
Rammiler distribution. The Swedish engineer W. Weibull came across this distribution type 
when investigating mechanical wear in the early thirties of the past century. 


Example 2.16 By a valid standard, the useful life X of front tires of a certain type of 
trucks comes to an end if their tread depth has decreased to 5mm. From a large sam- 
ple of n= 120 useful lifes of front tires, taken under average usage conditions, the 
mean useful life had been determined to be 2 years. The histogram of the same sam- 
ple also justifies to assume that X has a Rayleigh distribution. 

a) What is the probability of the random event A that the useful life of a tire exceeds 
2.4 years? 

By (2.77), the unknown parameter 8 of the Rayleigh distribution can be obtained 
from the equation E(X) =2 =0,/7/4 . It follows 0 = 2.25676. Hence, 


P(A) = P(X> 2.4) = e724) = 0.34526. 
b) What is the probability of A on condition that a tire has not yet reached the end of 
its useful life after 2 years of usage? From the formula of the conditional probability 
(1.22), the desired probability is 
P(A|X > 2) = P(X > 2.4|X > 2) 
1-F(2.4) — ¢7(2.4/8") 
1-F(2)  4-(2/0”) 


— e-0.4/2.25676 _ 9 93.757, . 


Normal Distribution A random variable X has a normal (or Gaussian) distribution 
with parameters 1 and o? if it has density (Figure 2.19) 


(i 
ath ss 4 DSS, 
Pac. 


The corresponding distribution function can only be given as an integral, since there 
exists no function the first derivative of which is f(x): 


F(X) = 


» ~O<xX<t0, —M<p<t0, o>0. (2.81) 


po 
F(x) = mele 20? dy, —w<x<+0, (2.82) 
As the notation of the parameters indicates, mean value and variance are 
E(X)=p, Var(X)=07?. (2.83) 


The mean absolute deviation of X from E(X) is 


E(|X- E(X)|) = /2/n o ~ 0.7980. (2.84) 
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Figure 2.19 Density of the normal distribution (Gaussian bell curve) 


This can be seen as follows: The substitution y = (x — 1)/o in 
+ 


E(X- BQ) = | be wl 


e-@-W)?/20? dx 
yields 


+00 
E(X-EVO)= J blp=e?"? ody 


0 oa) 
eee ae?) 
=| [eV ? dt fver? dy 
4 i 


Tn 
_ 20 [ver a= Belge a2 
i } [in oe 


The density f(x) is positive at the whole real axis. It is symmetric with symmetry 
center xs =p and has points of inflection atx; =u—o andx2=p+0. 


In the intervals [u—ko, u+ko], k=1,2,3, X assumes values with probabilities: 
P(Ku-osX< uto) =0.6827, 
P(u-20 < X< w+ 20) =0.9545, 
P(u-30 <X< +30) = 0.9973. 


In particular, if a random experiment with outcome X is repeated many times, then 
99.73% of the values of X will be in the '30-interval’ [u—30, u+3o0]. Therefore, 
only 0.27% of all outcomes will be outside the 30-interval. In view of the symmetry 
of f(x), this implies that for p = 30 negative values of X occur only with probability 


sd — 0.9973) = 0.000135 = 0.0135%. 


Thus, in case 1 > 30 the normal distribution can approximately serve as probability 
distribution for a nonnegative random variable. If u<3o0, then a truncation with 
regard to x =0 is recommended according to formula (2.68) with c=0 and d=o. 

This makes sure that negative values cannot occur. The truncated normal distribution 
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is a favorite model for lifetimes of systems subject to wear out. Generally, for rea- 
sons to be substantiated later (section 5.2.3, page 208), the normal distribution is a 
suitable probability distribution of random variables, which are generated by additive 
superposition of numerous effects. 
A normally distributed random variable X with parameters t1 and o? is denoted as 
X= Nu,o”). 
Generally, the standardization S of a random variable X as given by (2.70) does not 
have the same distribution type as X. But the standardization 
X-u 

See. 
of a normally distributed random variable X = N(u, 0) is again normally distributed. 
This can be seen as follows: 


F(x) = P(S <x) = p( <x) =P(X< ox+U). 


From (2.82), substituting there u = ur 
aoe _ On? 3 : 
F(x) = 20? ae =w7/2 dy. 
s(x) inc J e dy Tm a e du 


By comparison with (2.82), the right integral in this line is seen to be the distribution 
function of a normally distributed random variable with mean value 0 and variance 1. 
This implies the desired result, namely S = NM(0, 1). S is said to be standard normal. 
Its distribution function is denoted as D(x) : 


®(x) = P(M(0, 1) <x) = io j ew dy, —w<x<o, (2.85) 


The corresponding density @(x) = /(x) is 


1 el, 
2m 


ox) = —-o<xX<0, (2.86) 


@(x) or ~(x), respectively, determins the standard normal distribution. 


@(x) is closely related to the Gaussian error integral Erf (x), which led C. F. Gauss 
to the normal distribution: 


Erf (x) = 1 et dy, 


Simple transformations, taking into account ®(0) = 1/2, yield 


O@)=}4 --ex(<2] and Erf(x) = JE (a2 x)-4). 


Since @(x) is symmetric with symmetry center xs =0 (Figure 2.20), 


(x) = 1-O(-x). 
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Figure 2.20 Density and percentiles of the standardized normal distribution 


From this another useful formula results: 
P(-x < N(O, 1) < +x) = ®(x) — B(-x) = 20(x) - 1. (2.87) 


Hence, there is the following relationship between the a- and the (1—-a)-percentiles of 
the standardized normal distribution: 


Xa =X1q, O0<a<1/2. 
This is the reason for introducing the following notation (Figure 2.20): 
Za=Xj-q O<a<1/2. 
Hence, with o replaced by a/2, 
P(—Zq/2 = NO, 1) $Zq2) = PE a2) - P(Za2) = 1-a. 


The distribution function F(x) of X= N(u,o”) can be expressed in terms of @(x) as 
follows: 


F(x) = P(X <x) =P(*5 Ee x4) = P(N(O, I< xt) = o(2#), 


Corollaries 1) The interval probabilities (2.5) are given for any normally distributed 
random variable X= N(u,o7) by 


PasX< n= o( Po) eg 2"), (2.88) 


2) If xq, denotes the a-percentile of Y= M(u,o7), then 


a= Fea) = 054) 


so that, for any a < 1/2, 
Xq— 
o 


BE oe or Xq = OZq + iL. 


Therefore, determining the percentiles of any normally distributed random variable 
can be done by a table of the percentiles of the standardized normal distribution. 
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Example 2.17 A company needs cylinders with a diameter of 20mm. It accepts devi- 
ations of 0.5 mm. The manufacturer produces these cylinders with a random diame- 
ter X, which has a N(20, o”)-distribution. 

a) What percentage of cylinders is accepted by the company if o? = 0.04 mm? 

Since the condition > 30 is fulfilled (u > 1000), X can be considered a positive 
random variable. By (2.89) and (2.88), the probability to accept a cylinder is 


P(|X—20| < 0.5) = P(19.5 < X< 20.5) = p( 2S =20 < N(0, 1) < 2235 5-20) 
= P(-2.5 < MO, 1) < +2.5) =2 0(2.5)- 1 
= 2 - 0.9938 — 1 = 0.9876. 
Thus, 98.76% of the produced cylinders are accepted. 
b) What is the value of o? if the company would reject 4% of the cylinders? 
P(|X— 20| > 0.5) = 1-—P(19.5 < X< 20.5) 
a4 — p( 125. —20 <N(0,1)< 20.5— zi) 


=|- p(-95 <= < M0, 1) <2 05) = 1-[2 0(0.5/o)- 1] 
= 2[1 - (0.5/o)]. 
The term 2 [1 — ®(0.5/o)] is required to be equal to 0.04. This leads to the equation 
(0.5/0) = 0.98. 


Now one takes from the table that value x99, for which D(xg 9g) = 0.98. In other 
words, one determines the 0.98-percentile of the standardized normal distribution. 
This percentile is seen to be x9 9g = 2.06. Hence, the desired o must satisfy 

0.5/0 = 2.06. 
It follows o = 0.2427. Oo 


Example 2.18 By a data set collected over 32 years, the monthly rainfall from 
November to February in an area has been found to be normally distributed with 
mean value 92mm and variance 784mm. (Again, the condition 1 = 30 is fulfilled.) 


What are the probabilities of the 'extreme cases’ that (1) the monthly rainfall during 
the given time period is between 0 and 30mm, and (2) exceeds 150mm? 


(1) PO<X< 30) = P{ 222 ral <N(0,1)< 22 — = @(-2.214) - &(-3.286) 


~ @(-2.214) 0.0135. 
(2) P(X> 150) = P(.N(0, 1) > ean = 1- (2.071) x 1-0.981 


=0.019. oO 
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The first four moments (2.56) of the normal distribution M(u, 0”) are 


Hy == EC), 
Hy =07 +p, 
13 = 3p07 +p, 


L4 = ut + 61207 +304, 
and its first four central moments (2.57) are 
m,=0, m2 =o’, m3=0, m4= ut + 6207 +304, 
In view of the key role the normal distribution plays in probability theory, it is useful, 
particularly for applications, to know how well any other probability distribution can 
be approximated by the normal distribution. Information about this gives the excess 
Ye defined for any probability distribution with second central moment m2 and 


fourth central moment m4: 
m4 


(my 

Since yf is 0 for N(1,07), the excess can serve as a measure for the deviation of the 
distribution of any random variable with mean uw and variance o? from M(u,o7) in 
an environment of [L. 


YE 


A 
fo) 


0 
Figure 2.21 Densities of the logarithmic normal distribution 
Logarithmic Normal Distribution A random variable Y has a logarithmic normal 


distribution or log-normal distribution with parameters 1 and o? if it has distribution 
function and density (Figure 2.21) 


roy=o{ BIB) y>0, o>0, -w~<p<o, 


_(ny-p)’ 
f)=—t—e 20°; y>0, o>0, -w<p<o. 
V2 Oy 
Thus, Y has a log-normal distribution with parameters 1 and o? if it has structure 
Y=e*% with X¥=N(u,o7). Hence, if yq is the a-percentile of the log-normal distri- 
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bution and x the a-percentile of the M(u,o”), then yo =e**, or, in terms of the 
a-percentile wg of the standard normal distribution, yg = e°“**". Since ug.5 =0, the 


median is yg.5 =e". The distribution is unimodal with mode ym =e" . 


Mean value and variance of X are 
BUX) = e972, Var(X) = [EU I2(e% - 1). 


The Charlier skewness and the excess are 
Yc= (Je* -1 \(er +2), YE= 045” 42635" +3620 — 6, 


Example 2.19 As the Rosin-Rammler distribution, the logarithmic normal distribu- 
tion is a favorite model for the particle size of stone and other materials after a grind- 
ing process. Statistical analysis has shown that the diameter of lava rock particles 
after a grinding process in a special mill has a logarithmic normal distribution with 
mean value E(X) = 1.3002 mm and variance Var(X) = 0.0778. 

What percentage of particles have their diameter in J=[1.1, 1.5mm]? 


Solving the system of equations E(X) = 1.3002, Var(X) = 0.0778 for u and o” gives 
uu =0.24mm and o” = 0.045. Therefore, 


PULI<¥<15) = o(m15=0.24) o( mL =0.24) 


0.212 0.212 
= 0(0.781) — (0.683) = 0.783 — 0.246 = 0.537. 
Thus, the corresponding percentage of particles is 53.7%. O 
A 
I) 1 


0, <02 
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Figure 2.22 Density of the logistic distribution 


Logistic Distribution A random variable X has a logistic distribution with parame- 
ters 1. and o if it has distribution function 


EE 
E(w) 


l+e Bo 


and density (Figure 2.22) 


F)= , ~O<x<+0, o>0, 
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S@)= 7» ~O<xX< +0, o> 0. 


This distribution is symmetric with regard to 1. Mean value, variance, and excess are 
E(X)=p, Var(X)=07, ye=1.2. 


The denominator of F(x) has the functional structure of a well-known growth curve 
originally proposed by Verhulst (1845). Generally, the logistic distribution proved to 
be a suitable probabilistic model for growth processes with saturation (i.e., not exceed- 
ing a given upper bound) of plants, in particular trees. 
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Figure 2.23 Densities of the inverse Gaussian distribution 


Inverse Gaussian Distribution A random variable _X has an inverse Gaussian distri- 
bution or a Wald distribution with positive parameters a and B if it has the density 


(Figure 2.23) 
_p)2 
y= exp ere x>0. (2.89) 


Integration gives the corresponding distribution function 


F(x) = of 2h } +e 20/8 of 222}, x>0. 


Mean value, variance, and mode are 


F(X) =8, Var(X)=p3/a, xm=B(/1+GB/2a)? -3f/2a). (2.90) 


Charlier's skewness and excess are 


Y¥c= JP/o ,vye = 15P/a. 
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The practical significance of the inverse Gaussian distribution is mainly due to the 
fact that it is the first passage time distribution of the Brownian motion process and 
some of its derivatives (pages 504, 513). This has made the inverse Gaussian distri- 
bution a favorite model for predicting time to failures of systems, which are subject 
to wearout. 


2.3.4 Nonparametric Classes of Probability Distributions 


This section is restricted to the class of nonnegative random variables. Lifetimes of 
technical systems and organisms are likely to be the most prominent members of this 
class. Hence, the terminology is tailored to this application. The lifetime of a system 
is the time from its starting up time point (birth) to its failure (death), where 'failure' 
is assumed to be an instantaneous event. In the engineering context, a failure of a 
system needs not be equivalent to the end of its useful life. If X is a lifetime of a sys- 
tem with distribution function F(x) = P(X <x), then F(x) is called its failure probab- 
ility and F(x) = 1 — F(x) is its survival probability. F(x) and F(x) are the respective 
probabilities that the system does or does not fail in the interval [0, x]. 


a t xX 
: 
P Xi > 


Figure 2.24 Illustration of the residual lifetime 


Residual Lifetime Let F(x) be the distribution function of the residual lifetime X; 
of a system, which has already worked for ¢ time units without failing (Figure 2.24): 
F(x) = P(X <x) = P(X-t <x|X> 2). 

By the formula of the conditional probability (1.22) 
Fi Vr chae Se) _ Pwt<XSt+x) 
1 P(X f) ~ PX>D 
so that, by (2.44), page 59, 
F(t+x)- FO 


ae > 
F(x) 1-FO ” t>0, x20. (2.91) 
The corresponding conditional survival probability F(x) = 1 — F(x) is 
GS S020 (2.92) 
F(t) 


Hence, by using formula (2.52), the mean residual lifetime X(t) = E(X;z) 1s seen to be 


u(t) = J 9 Fix) dx = io J ; F(x) de. (2.93) 
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Example 2.20 Let the lifetime X have a uniform distribution over [0,7]: 
F(x) =x/T, OS x <T. 
Then, 
Fig)=7o, 0O<¢<T, Vex <T=t 


Thus, X; is uniformly distributed over the interval [0, 7—¢], and for fixed x, the con- 
ditional failure probability is increasing with increasing age t of the system, <7. O 


Example 2.21 Let X have an exponential distribution with parameter A : 
F(x)=1-e™, x20. 
Then, for given ¢ > 0 the conditional failure probability of the system in [f, t+ x] is 


i= e MH) - (1 a e™) 


F(x) = ( = l-e* = F(x), x20. 


That means, if a system with exponentially distributed lifetime is known to have sur- 
vived the interval [0,¢], then it is at time point ¢ 'as good as new' from the point of 
view of its future failure behavior, since its residual lifetime X; has the same failure 
probability as the system had at time point t= 0, when it started operating. In other 
words, systems with property 

F(x) = F(x) forall 20. (2.94) 
'do not age'. Thus, the exponential distribution is the continuous analogue to the 
geometric distribution (example 2.3). Its is, moreover, the only continuous distri- 
bution which has this so-called memoryless property or lack of memory property. 
Usually, systems (technical or biological ones) have this nonaging property only in 
certain finite subintervals of their useful life. These intervals start after the early 
failures have tapered off and last till wearout processes start. In the nonaging period 
failures or deaths are caused by purely random influences as natural catastrophes or 
accidents. In real life there is always some overlap of the early failure, nonaging, and 
wear out periods. 


|" fundamental relationship (2.94) is equivalent to the functional equation 
F(t+x) = F(x) - F(t). 
Only functions of type e“* are solutions of this equation, where a is a constant. 


The engineering and biological background of the conditional failure probability 
motivates the following definition: 


Definition 2.3 A system is aging (rejuvenating) in the interval [t1, tz], t) < to, if for 
an arbitrary but fixed x, x >0, the conditional failure probability F';(x) is increasing 
(decreasing) with increasing ¢, t) <t< fp. e 


Remark Here and in what follows the terms 'increasing' and 'decreasing' have the meaning of 
‘nondecreasing’ and 'nonincreasing', respectively. 
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For technical systems periods of rejuvenation may be due to maintenance actions, 
and for human beings due to successful medical treatment or adopting a healthier 
lifestyle. 


Provided the existence of the density f(x) = F’(x), another approach to modeling the 
aging behavior of a system is based on the concept of its failure rate. To derive this 
rate, the conditional failure probability F,(At) of a system in the interval [¢,¢+ Af] is 
considered relative to the length At of this interval. This gives a conditional failure 
probability per unit time, i.e. a ‘failure probability rate': 


A — Te, AHEAD HF) 
At F(At) = F(t) At ; 
If At > 0, the second ratio on the right-hand side tends to f(t). Hence, 
lim 1 FAN = f(0/FO. (2.95) 
Ato At 
This limit is called failure rate or hazard function, and it is denoted as X(A) : 
Ut) = f(O/F (2). (2.96) 


In demography and actuarial science, A(A) is called force of mortality. Integration on 
both sides of (2.96) yields 


x xX 
Fe) =1—e 10% or Fay ee lo, yoo, (2.97) 


By introducing the integrated failure rate 
A(x) = ]5 Made, 
F(x), F(x) and the corresponding survival probabilities can be written as follows: 
F(x) = 1 _ e AQ), F(x) = e AQ), 
F(x) = 1-eTAG2-A0], F(x) = eTAG)-AO], x > 0, 120. (2.98) 
This representation of F(x) implies an important property of the failure rate: 


A system ages in [t1,to], t) <to, ifand only if its failure rate is increasing in this 
interval. 


Formula (2.95) can be interpreted in the following way: For small Az, 

F (At) = A(t) At. (2.99) 
Thus, for At sufficiently small, A(4 At is approximately the probability that the 
system fails 'shortly' after time point ¢ if it has survived the interval [0, ¢]. Hence, the 
failure rate gives information on both the instantaneous tendency of a system to fail 
and its 'state of wear' at any age ¢. 
The relationship (2.99) can be written more exactly in the form 

F (Ad = (A At+ o(Ad, 
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where o(x) is the Landau order symbol with respect to x + 0, i.e. o(x) is any function 
of x with property 
. 0(x) 

lin —+ =0. 2.100 

ra (2.100) 
In the ratio of (2.100), both functions y;(x) = o(x) and the function y9(x) =x tend 
to 0 if x0, but y;(x)=o0() must approach 0 'much faster’ than y2(x)=x if 
x > 0. Otherwise (2.100) could not be true. 


The relationship (2.99) can be used for the statistical estimation of A(A): At time t= 0, 
n identical systems start operating. Let n(t) be the number of those systems, which 
have failed in the interval [0,¢]. Then the number of systems which have survived 
[0, ¢] is n —n(f), and the number of systems which have failed in the interval (¢,¢+ Ad] 
is n(t+ At) —n(). Then an estimate for the system failure rate in (t¢,¢+ Ad] is 

Tat Ap-nd 


AQ) "At "= n(t) 


> t<x<t+At. 

Based on the behaviour of the conditional failure probability of systems, numerous 
nonparametric classes of probability distributions have been proposed and investigat- 
ed during the past 60 years. Originally, they aimed at applications in reliability engi- 
neering, but nowadays these classes also play an important role in fields like demo- 
graphy, actuarial science, and risk analysis. 


Definition 2.4 F(x) is an JFR (increasing failure rate) or a DFR (decreasing failure 
rate) distribution if F';(x) is increasing or decreasing in ¢ for fixed but arbitrary x, res- 
pectively. Briefly: F(x) is JFR (DFR). e 


If the density f(x) = F’(x) exists, then (2.98) implies the following corollary: 


Corollary F(x) is IFR (DFR) in the interval [x1,x2],x1 <x, if and only if the cor- 
reponding failure rate A(x) is increasing (decreasing) in [x, x2]. 


The Weibull distribution shows that, within one and the same parametric class of 
probability distributions, a distribution may belong to different nonparametric proba- 
bility distributions: From (2.77) and (2.97), 


A(x) = (x/6)8 

so that 
= B (x pel 
mo= 4 -() , x20. 

Hence, the Weibull distribution is JFR for B > 1 and DFR for B <1. For B=1 the 
failure rate is constant: 4 = B/@ (exponential distribution). The exponential distribu- 
tion is both JFR and DFR. This versatility of the Weibull distribution is one reason 
for being a favorite model in applications. 
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The failure rate (force of mortality) of human beings and other organisms is usually 
not strictly increasing. In short time periods, for instance, after having overcome a 
serious disease or another life-threatening situation, the failure rate will decrease, 
although the average failure rate will definitely increase. Actually, in view of the 
finite lifetimes of organism, their failure rates A(x) will tend to infinity as x > oo. 
Analogously, technical systems, which operate under different, time-dependent stress 
levels (temperature, pressure, humidity, speed), will not have a strictly increasing 
failure rates, although in the long-run, their average failure rates are increasing. This 
motivates the following definition: 


Definition 2.5 F(x) is an JFRA (increasing failure rate average) distribution or a 
DFRA (decreasing failure rate average) distribution if 


1 — 
— 5 InF@) 


is an increasing or a decreasing function in x, respectively. e 


To justify the terminology, assuming the density f(x) = F’(x) exists and taking the 
natural logarithm on both sides of the right equation in (2.97) yields 


InF(x) =-[5 Ade. 
Therefore, 
Kx) =—4inF(x) = 4 J" a(pat 


so that -(1/x)In F(x) turns out to be the average failure rate in [0,x]. An advantage of 
definitions 2.3 to 2.5 is that they do not require the existence of the density. But the 
existence of the density and, hence, the existence of the failure rate, motivates the 
terminology. Other intuitive proposals for nonparametric classes are based on the 
‘new better than used' concept or on the behavior of the mean residual lifetime (4); 

see Lai, Xie (2006) for a comprehensive survey. 


Obviously, F(x) being JFR (DFR) implies F(x) being JFRA (DFRA): 
IFR => IFRA, DFR = DFRA. 
Knowing the type of the nonparametric class F(x) belongs to allows the construction 


of upper or lower bounds on F(x) or F(x). For instance, if 1, = E(X”) is the nth 
moment of X and F(x) = P(X < x) is JFR, then 


F(x) > | exp{-x(n!/pn)!} for x<pn”, 
“) 0 


otherwise. 


In particular, for n = 1 with uw =p, = E(X), 


exp{—x/u} for x <p, (2.101) 
0 


otherwise. 


F(x) = | 
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A 


upper bound (2.103) 


0.5 


Bounds for survival probability 


0 


Figure 2.25 Upper and lower bounds for example 2.22 


If F(x) is JFR, then 


F(x)-e*/#| <1- J2y+1 2.102 
sup |F(x)—e™#| <1 — J2y (2.102) 
with 
M2 
eee | 
Y 32 


It can be shown that y < 0 (y = 0) if F(x) is JFR (DFR). 
If F(x) is FRA, then 


1 for x<uh, 


2.103 
e* for x>p, ( ) 


F(x) < | 


where r = r(x, 1) is the unique solution of 
l-rp=e7, 
Example 2.22 Let X have distribution function 
F(x) =P(X<x)=1-e™ , x20. 
This is a Rayleigh distribution (page 77) so that F(x) is JFR and X has mean value 
p= E(X) = /x/4 and second moment py = Var(X)+ py? = 1 


(see formulas (2.80)). Figure 2.25 compares the exact graph of the corresponding 
survival probability F(x) with the lower bound (2.101) and the upper bound (2.103). 
By (2.102), an upper bound for the maximum deviation of the exact graph of F(x) 
from the exponential survival probability with the same mean p= Jnl4 as X is, 
since y = 2/m— 1 = —0.3634, 


sup F(x) -e*/ 14 | = sup |e? — eV | < 0.4773. Oo 
x x 
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2.4 MIXTURES OF RANDOM VARIABLES 


The probability distribution Py (definition 2.1) of any random variable X depends on 
one or more numerical parameters. To emphasize the dependency on a special para- 
meter 2, in this section the notation P(A) instead of Py is used. Equivalently, in 
terms of the distribution function and density of X, 


Px) = FxQ,A), x0) = fx, A). 


Mixtures of random variables or, equivalently, their probability distributions arise 
from the assumption that the parameter A is a realization (value) of a random vari- 
able Z, and all the probability distributions belonging to the set {Py(A), 4 € Rz}, 


where R; is the range of Z, are mixed in a way to be explained as follows: 


1. Discrete random parameter L Let L have range R; = {Ag, A1,...} and probabi- 
lity distribution 

P,={o, ,...} with t_=P(L=An), n=0,1,..., Lo ta = 1. 
Then the mixture of the probability distributions of type P(A) in terms of the mix- 
ture of the corresponding probability distribution functions of type Fy(x,A), A € Ry, 
is defined as 


G(x) = D0 Fx(x, An) tn. 


2. Continuous random parameter L Let L have range R; with Ry; c (—%, +00) and 
probability density 
Fra), %€ Rr. 
Then the mixture of the probability distributions of type P(A) in terms of the distri- 
bution functions of type F'y(x, A) is defined as 
G(x) =] a, Px A) SLA) dd. 


Thus, if Z is a discrete random variable, then G(x) is the weighted sum of the distri- 
bution functions F'y(x,An) with weights m, given by the probability mass function 
of L. If ZL is continuous, then G(x) is the weighted integral of Fy(x,) with weight 
function f ;(x,4). In either case, G(x) has properties (2.3) and (2.4) so that it is the 


distribution function of a random variable Y, called a mixed random variable, and the 
probability distribution of Y is the mixture of probability distributions of type Px(A). 


If X is continuous and L discrete, then the density of Y is 
g(x) = Lin-0 fx, An) Tn. 
If X and L are continuous, then the density of Y is 


ga) =f, Ae WPL) dd. 
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Formally, G(x) and g(x) are the mean values of F'y(x, L) and f x(x, L), respectively: 
G(x) = EF xx,L)), g(x) = EF x(x, L)). 
If L is discrete and_X is discrete with probability distribution 
Px(A) = {pi(A) = P(X=x,h); i=0,1,...}, 


then the probability distribution of Y, given so far by its distribution function G(x), 
can also be characterized by its individual probabilities: 


P(Y=x;j) = Zino Pin) Mn = E(p{(L)); i= 0,1,.... (2.104) 
If Z is continuous and _X is discrete, then 
P(Y=x;) = Jp, PIASL(A)AD = E(p(L)). (2.105) 


The probability distribution of Z is sometimes called structure or mixing distribution. 
Hence, the probability distribution Py of the 'mixed random variable’ Y is a mixture 
of probability distributions of type Px, with regard to a structure distribution Pz. 


The mixture of probability distributions provides a method for producing types of 
probability distributions, which are specifically tailored to serve the needs of special 
applications. 


Example 2.23 (mixture of exponential distributions) Assume X has an exponential 
distribution with parameter 1: 


Fy(x,d) = P(X< x) =1-e*, x20. 


This distribution is to be mixed with regard to a structure distribution P;, where L is 
exponentially distributed with density 


fr)=pe*, p>. 
Mixing yields the distribution function 


G(x) = J? Fea) fra) = [7 —e™) pe tan = 1- 


x+y 
Hence, mixing exponential distributions with regard to an exponential structure dis- 
tribution gives the Lomax distribution with distribution function and density 

u 


GQ) a a)= pr? FBO HPO. (2.106) 


The Lomax distribution is also known as Pareto distribution of the second kind. O 


Example 2.24 (mixture of binomial distributions) Let X have a binomial distribution 
with parameters n and p: 


P(X =i) = ("Jia —p)"i, 7=0,1,2,...n. 
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The parameter n is considered to be a value of a Poisson with parameter 4 distributed 
random variable N: 


P(N=n)= mae n=0,1,... (A fixed). 
Then, from (2.104), using 
Ge =0 for n<i, 
1 


the mixture of binomial distributions Py(n), n=0,1,..., with regard to the Poisson 
structure distribution Py is obtained as follows: 


Pr=)= % (")pia—pyri Bret 
-£()p0- pre 


_ py! a [Ad -p)]* - Gey oh eh (Ip). 


it k! 
Thus, 
P(Y=i)= ODY ae, i=0,1,.... 
This is a Poisson distribution with wareieieh Ap. O 


Mixed Poisson Distribution Let X have a Poisson distribution with parameter 1: 


Py) = {P= =e; 1=0,1,..5 > 0}. 


A random variable Y with range {0, 1,...} is said to have a mixed Poisson distribution 
if its probability distribution is a mixture of the Poisson distributions P(A) with 
regard to any structure distribution. For instance, if the structure distribution is given 
by the density f(A) of a positive random variable L (i.e., the parameter 2 of the 


Poisson distribution is a realization of L), the distribution of Y is 


P(Y=i)= ps eA fu(0) dh, i=0,1,.... (2.107) 


A mixed Poisson distributed random variable Y with any structure parameter L has the 
following properties 


E(Y) = EL) 
Var(Y) = E(L) + Var(L) 


ele (2.108) 
P(Y>n)=| Se*F,(a))dr 
din 


where F(A) = P(L <A) is the distribution function of L and F(A) = 1—F,(A). 
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Example 2.25 (mixed Poisson distribution, gamma structure distribution) Let the 
random structure variable ZL have a gamma distribution with density 


Qa 
fu) = Pah te, A>0, a>0, B>0. 


The corresponding mixed Poisson distribution is obtained as follows: 


P(Y= ya fe 1, BY a1 BA gq, 
1 i! e T(a) e 


yita-l oh (B+) dx 


| 
ion 
no) 
20 
on 8 


BS . 
=- SETS [etertert dx 
0 


_1TG+a)_ B% 
i! FQ) ps1yio 


Thus, 


: fleet ee Boke ; 
=i)= Sy a ie S =O; 1p. QA 
PLY i) ( i (Bel) \Bpe1) " a>0, B>0, i=0,1, ( 09) 
This is a negative binomial distribution with parameters r= a and p= 1/(B +1) (see 
formula (2.31), page 53). In deriving this result, the following property of the gamma 
function with x=i+a, i=1,2,..., had been used 


T(x) =(-1)T@-1);x>0. Oo 


2.5 GENERATING FUNCTIONS 


Probability distributions or at least moments of random variables can frequently be 
obtained from special functions, called (probability or moment) generating functions 
of random variables or, equivalently, of their probability distributions. This is of im- 
portance, since it is in many applications of stochastic methods easier to determine 
the generating function of a random variable instead of directly its probability distri- 
bution. This will be in particular demonstrated in Part II of this book in numerous 
applications. The method of determining the probability distribution of a random var- 
iable from its generating function is mathematically justified, since to every probabi- 
lity distribution belongs exactly one generating function and vice versa. 


Formally, going over from a probability distribution to its generating function is a 
transformation of this distribution. In this section, transformations are separately 
considered for discrete random variables (z-transformation) and for continuous ran- 
dom variables (Laplace transformation). 
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2.5.1 z-Transformation 


The discrete random variable X has range R = {0,1,...} and probability distribution 
{p, =P(X=1); i=0,1,...}. 


The z-transform M(z) of X or, equivalently, of its probability distribution is for any 
real number z with |z| < 1 defined as the power series 


Mz) = Xi Piz". 


Thus, the probability distribution of X has been transformed into a power series. In 
this book, the extension of this series to complex numbers z is not necessary. 


To avoid misunderstandings, sometimes the notation M/y(z) is used instead of M(z). 
From (2.10) with A(z;) =z‘, M(z) is seen to be the mean value of Y=z* : 
M(z) = E(z*). (2.110) 
M(z) converges absolutely for |z| < 1: 
\M(z)| < Zizo pi |z' | <Lizo pi =. 
Therefore, M(z) can be differentiated (as well as integrated) term by term: 
M'(2) = Xizo ipiz"! 
Letting z= 1 yields 
M'(1) = Xiz0 ip; = EX). 
Taking the second derivative of M(z) gives 
M"(z) = Dieoi- lip;z**. 
Letting z= 1 yields 
M"(1) = Zioli- Vip; = Zico? pi — Leo tpi. 
Therefore, M/’(1) = E(X2) — E(X). Thus, the first two moments of X are 
E(X)=M"(1),  E(X?)=M"(1)+M"(1). (2.111) 
Continuing in this way, all moments of X can be generated by derivatives of M(z). 
Hence, the power series M(z) is indeed a moment generating function. By (2.13), 
E(X)=M"(1), Var(X)=M"(1)+M/(1)- [M(Dy’. (2.112) 
M(z) is also a probability generating function, since 
Po = MO), pi = M0), pr = M0), p3 = MO)... 
Generally, 
_ 1 a"M@) 
He Sh ia” [age 


n=0,],.... (2.113) 
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Otherwise, according to the definition of M(z), developing a given z-transform with 
unknown underlying probability disribution into a power series yields the probabil- 
ities p; simply as the coefficients of z’. 


Geometric Distribution Let X have a geometric distribution with parameter p (page 
50): 
p;=P(X=)=p(1-p); i=0,1,.... 
Then, 
M2) = & pl ~py'z 


=p X[(1-p)z]. 
i=0 
By the geometrical series (2.16) with x = (1 —p)z, 


ae oe 
BOOTS De. 


The first two derivatives are 


Mi =—PUSP) ayy = 2p =e)” 


[1-(1-p)z]?’ [1-(-p)z])? 
Hence, by (2.111) and (2.112), 
E(X) = ae, pee2y= UE P=P) er —P) var{x) = ’ 


Poisson Distribution Let_X have a Poisson distribution with parameter A (page 56): 
pj=P(X=)= Me, i=0,1,.... 
Then, in view of the exponential series (2.19), 
MO= I 4, ce otk (2) = ph eth, 


Hence, 
Mz) = e* 9), 
The first two derivatives are 
M’(z) = rer@1) M"(z) = 42 eh), 
Letting z= 1 yields 
M(1)=4, MM(1)=22. 
Thus, mean value, second moment, and variance of X are 


E(X)=4, E(X?)=2(A41), Var(X)=2. 
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Mixed Poisson Distribution The mixed Poisson distribution with density f7(A) of 
its structure parameter L has the individual probabilities (formula (2.107)) 


P(Y=i)= | A e*f(n)dh, S00: 
ie 
Hence, its z-transform is 
ee) 00 Pod . 0 © yy i = 
Myz)= > [fe pyaar == [rx Perpayar 
i=0 \g °° Qoi0 °° 
so that 


CO 


My@)=|5 ee DV ALAda. 
This result can be interpreted as 'mixture of z-transforms of Poisson distributions’. 
Binomial Distribution Let X have a binomial distribution with parameters n and p 
(page 51): 
pj=P(X=)= (")pia =p) f= 051,.cight. 
i 

Then, 

n : n(n F oe 

M@) = % pizi= 3 (")p'a—pyriz 
i=0 i=0 \? 


=¥(")@zya-py. 
i=0 “i 
This is the binomial series (2.20) with x = pz and y= 1—p so that 
Mz) =[pz+1-p)]". 
By differentiation, 
M'(@)=np[pz+1-p)\"", 
M2) =(n—1)np?[pz+1—p)]"*. 
Hence, 
M'(1)=np and M”(1)=(n-1)np? 
so that mean value, second moment, and variance of X are 


E(X)=np, E(X2) =(n—1)np? +np, Var(X) =np(1—p). 


Convolution Let {po,p1,...} and {qo,q1,...} be the respective probability distribu- 
tion of the discrete random variables X and Y, and let a sequence {rg,7r1,...} be defin- 
ed as follows 


n= D0 Pini =PO9n+P1dn-1t''+Pngo, n=0,1,... (2.114) 
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The sequence {79,7 ,...} is called the convolution of the probability distributions of 
X and Y. The convolution is the probability distribution of a certain random variable 
Z since {ro,/1,..-} fulfills the conditions of a discrete probability distribution (2.6): 


Yorn = 1, rn = 0. 


For deriving the z-transform of Z, Dirichlet's formula on how to change the order of 
summation in finite or infinite double sums is needed: 


Ln LH0 Gin = Leo Unni Gin (2.115) 
Now, 
Mz(z) = Lnn0 rn" = Linn Deo PiFn-iZ" 
= Spe Se Gn iz”) 
=(ZZo 7:2!) (Z2o 442"). 


Thus, the z-transform of the convolution of the probability distributions of two ran- 
dom variables X and Y is equal to the product of the z-transforms of the probability 
distributions of X and Y: 


M7(z) =My(z)- My(z). (2.116) 


2.5.2 Laplace Transformation 


The Laplace transform f (s) of areal function f(x), x € (—0,+00), is defined as 


f(s) = [7% e** fla) de, (2.117) 
where the parameter s is a complex number. 


The Laplace transform of a function need not exist. The following assumptions | and 
2 make sure that this function exists for all s with Re(s) >b: 


1) f(x) is piecewise continuous. 


2) There exist finite real constants a and b so that f(x) < ae>* for allx>0. 


Notation If c=x+iy is any complex number (i.e.,i= /—-1 and x, y are real numbers), then 
Re(c) denotes the real part of c: Re(c) =x. For the applications dealt with in this book, the 
parameter s can be assumed to be a real number. 


If f(d is the density of a random variable X, then f(s) has a simple meaning: 
f(s) = E(e4). (2.118) 


This formula is identical to (2.110) if there z is written in the form z= e™. 
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The n-fold derivative of f(s) with respect to s is 


d" af) =(-1)" rex Ne SX f(x) dx. 


Hence, the moments of all orders of X can be obtained from E(X°) = E(1) = 1 and 


EX") = (- yr fs) , n=1,2,.... (2.119) 
s=0 
Sometimes it is more convenient to use the notation 
f(s)=LE 5). 


Partial integration in f(s) yields 


L(J%,, Aud du, s) = + 7(6) (2.120) 


and, if f(x) > 0 for all x € (—00, +00) and f(x) denotes the nth derivative of f(x) 
with regard to x, then 


FO) =s" f(s n=1,2,... (2.121) 
Note This equation has to be modified for all n = 1,2,... if f(x) =0 for x <0: 
FMS) =s" fls)—s"" FO) -s" 2 f(O) = = s'FEVO=FYO). (2.122) 


In particular, for n = 1, 


df(x 
{ae L(x) .s) =5 f(s) —f(0). (2.123) 
Let f, fo, .., fn be any n functions for which the corresponding Laplace transforms 
exist and f= f, +fo+---+fn. Then, 
S(s)= fils) +f2) +--+ f(9). (2.124) 


Convolution The convolution f| *f> of two continuous functions f; and fo, which 
are defined on (—00, +00), is given by 


(fi *f2)0) =| Ai@- frau) du. (2.125) 
The convolution is a commutative operation, i.e., 
(fi *2)0) = (2 *f@) =] AG wf) du. 
If fi(x) =/o(@) = 0 for all x < 0, then 
fi * V0) = [5p AO - WA) du = [> Ae—w fu) du. (2.126) 
The following formula is the 'continuous' analogue to (2.116): 


L(f\ *f2, 8) =L(fi, 8) Lf, 8) = fi (8) f(s). (2.127) 
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A proof of this relationship is easily established: 
Lift *fr,8)= |e I Ae w fiw duds 
= fe Au) [Ee PO fala uw) dedu 


= [PP eo fy(u) [7 e” fay) dy du 
= Lf, 8) L(f2, 8) =fy(s) -fa(s). 


In proving this relationship, the ‘continuous version' of Dirichlet's formula (2.115) 
had been applied: 


FP, feavydedy =f, [2, fle.y)dyde. 


Verbally, equation (2.126) means that the Laplace transform of the convolution of 
two functions is equal to the product of the Laplace transforms of these functions. 


Retransformation The Laplace transform f(s) is called the image of f(x), and f(x) 
is the preimage of f(s). Finding the preimage of a given Laplace transform (retrans- 
formation) can be a difficult task. Properties (2.124) and (2.127) of the Laplace trans- 
formation suggest that Laplace transforms should be decomposed as far as possible 
into terms and factors (for instance, decomposing a fraction into partial fractions), 
because the retransformation of the arising less complex terms is usually easier than 
the retransformation of the original image. 


Retransformation is facilitated by contingency tables. These tables contain important 
functions (preimages) and their Laplace transforms. Table 2.5 presents a selection of 
Laplace transforms, which are given by rational functions in s, and their preimages. 
There exists, moreover, an explicit formula for the preimages of Laplace transforms. 
Its application requires knowledge of complex calculus. 


Example 2.26 Let X have an exponential distribution with parameter ): 
f(x) =he**, x20. 
The Laplace transform of f(x) is 
T(s) = ie eS¥ eX dy =H Ve e SH) dy 


so that 
x 
SHH 


f(s)= 
The nth derivative of f(s) is 
d" f(s) 


“ds? 


Xn! 


Geyer 
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Thus, the nth moment of X is 


EX") = Fe; n=0,1,... Oo 
Example 2.27 Let X have a normal distribution with density 


en)? 
e 207 ; x © (—0,+0). 


1 
f= 
/2n6 
The Laplace transform of f(x) is 


a +00 _ (eH)? 
f(s) = ! J ee 20° dx. 


{2m 6 —% 


This improper parameter integral exists for all s. Substituting u = (x — n)/o yields 


f(s) = 1 ew f eos t= W 2a = eye ee go bstzo?s? { oo (utos)? 
20 ens [an J ; 
By substituting y = u+os, the second integral is seen to be {27 . Hence, 
1 G22 
4 =Ust=07: 
Some 2 (2.128) 
oO 


Two important special cases of the Laplace transform are the characteristic function 
and the moment generating function. 


Characteristic Function The characteristic function 
+00 ; 
wv) = JE ef) dx 


of a random variable with density f(x) is a special case of its Laplace transform, 
namely if the parameter s is purely imaginary number, i.e. s =iy. Thus, the charac- 
teristic function is nothing else but the Fourier transform of f(x). The advantage of 
the characteristic function to the Laplace transform is that it always exists: 


yo] = [JR et*7(x) dx| 
<7" |e'”*| f@) de 


=|" fade = 1. 


As the z-transform and the Laplace transform, the characteristic function is moment 
and probability generating. Characteristic functions belong to the most important 
tools for solving theoretical and practical problems in probability theory. 
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Moment Generating Function Formally, the moment generating function M(-) is 


exactly defined as the Laplace transform f(s), namely by formula (2.117). The dif- 
ference is that in case of the moment generating function the parameter s is always 
real and usually denoted as '-t’ so that 


M(t) =|" e! fx) dx. 
The key properties derived for Laplace transforms are of course also valid for the 
moment generating function. In particular, if f(x) is a probability density, then 
M(t) = E(e™). 


The terminology is a bit confusing, since, as mentioned before, z-transform, the Lap- 
lace transform, and the characteristic function of a random variable are all moment- 
as well as probability generating. 


Example 2.28 Let an image function be given by 


4 s 

s)=—-"—.. 
f(s) 2-2 
7(s) can be written as 

i : 1 . : 

es ; = fi(s)- fo (s). 
Is) Wai ong fils): f2 (9) 
The preimages of the factors can be found by means of Table 2.5: 


Fi(x) = coshx = 5 (e* +e) 


and 
f(x) = sinhx = $(e*-e*). 


Let f,(x) and fo(x) be 0 for all x <0. Then preimage f(x) of 7(s) is given by the 
convolution (2.126) of f\(x) and fo(x): 


(fi #2) = 4 J) (eh +e) (e" — du 


= Aff ex(1- e 24) du +5 e*(e2" - I)du | 


afl. | 1 el ac Qu _ i 
=tle ut se re oh u 0 


Thus, 
f= ¥xsinhx, 


This verifies the preimage given in Table 2.5 witha = 1. oO 
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Example 2.29 Let an image function be given by 
fo==——.,. 
(s? — 1(s +2)? 
The preimage cannot be taken from Table 2.5. But as in the previous example, it can 


be determined by factorization. But now the method of decomposition of f(s) into 
par- tial fractions is used: The denominator has the simple zeros s=1, s=—1 and 


the doubly zero s = —2. Hence, f(s) can be written in the form 
‘ A A B B 
(s?—-1)(s+2)2 s-1l stl st+2 (542)? 


The coefficients A,, A, B,;, and By are determined by multiplying the equation by 
(s? — 1)(s +2)? and subsequent comparison of the coefficients of s”; n =0,1,2,3; on 
both sides. This gives the equations 


ie 44, -44,-2B,-B,=0 
shy 84,-B,=1 

s?: 5A, +347+2B, +B =0 
s?: A, +A7+B,=0 


The solution is 
A, =1/18, Az =1/2, By =-5/9, Bp =-2/3. 
Therefore, 


I Doles, Uh 9g he al 
g=1 2 ee) Oost) 3 p43) 


A = ] 
f9=% 
The preimage of the last term can be found in Table 2.5. If no table is available, then 


this term is represented as 
1 1 1 


—2x 


The preimage of each factor is e~2" so that the preimage of 1/(s +2)? is equal to the 


convolution of e~2* with itself: 
e7lx * ex = I e 2-7) F e dy 
— [* ,-2x 
= i) 02 dy 
= xe 2, 


Now, by (2.124), retransformation of the image f(s) can be done term by term: 


f= rr ete BoB 4 xe, O 
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f (s) preimage f(s) preimage 
1 1 Goe sa cosh ax — t sinh ax) 
s“—a 
1 1 -1 — l ; 
= > n oa 
m n21 Gor @aaee 34% Sinhax 
1 eax 52 ele ‘ 
ar @-ap Da (sinhax + axcoshax) 
1 1 n-1 ,-ax 1 1 —ax —bx 
(s+a)" (n—1!” ‘ (s +a)(s +b) hoa ee. 
Ss = Ss 
1- e 1 —bx ax 
Gaye (l-ax)e @+a\s+b) be ae“) 
S a = 1 —ax —bx —b: 
(l-+)xe@™ i —a(e e (b—a)xe*) 
(sta)? : (stays+by2 |°? 
AY | _ a _ Ss 1 ax 2 —bx 
Gta sre ax _ A)3 4-ax GREE by ear a“ +[at+b(b-a)xJe*} 
5 1 7 | sinh (ax) s? 5 a [(a2e-® + b(b — 2a —b*x + abx) Je 
Daa (s ta)(s +b) 
ae 7 sin (ax) a “ple ~axe™*) 
Ss 
cosh (ax) 1 1 l-e)_ pl —-e™ 
ore <Ginerh) aap) e*)-b-e™)] 
s 1 1 = 
COS aX bye~&4 
s2 +a Gidee Dera \G@=Dbsocsa oF 
+(a—c)e™ + (b-a)e™] 
u 1 -e-@*) ERR LE aE l [a(b — c) e+ 
s(sta) |4@ (s +a)(s+b\(st+e) | @—b\(b-c)\(c-a) 
+b(c—a)e* + c(a— be] 
1 1 —ax s2 1 2. b —ax 
Zora |e "4 ~) |eraerDGTO Cy CS (Ce ana 
b?(c—aje* — c?(a— bye] 
l eee sin ax —x cos ax) —— [e~4* + 4 sin bx — cos bx] 
(s? +a?)?|" * (s ta)(s2 +b?) | a+b? b 
Ss ciao s =D ales Ne ? 
(52 +42) gt Sin ax (s+ays2+B2) | 52 wel ae “* + acos bx + bsin bx] 
re a 5, (sinax + axcos.ax) z ee 7 rm palate — ab sin bx + 62008 be’ 


Table 2.5 Images and the corresponding preimages of the Laplace transformation 
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2.6 EXERCISES 


Sections 2.1 and 2.2 
2.1) An ornithologist measured the weight of 132 eggs of helmeted guinea fowls 


[gram]: 
number 7 1 2 3 4 5 6 7 8 9 10 
weight x; 38 41 42 43 44 45 46 47 48 50 
number of eggs n; 4 6 7 10 13 26 33 16 #10 = =7 


There are no eggs weighing less than 38 and more than 50. Let X be the weight of a 
randomly picked egg from this sample. 


(1) Determine the probability distribution of X. 
(2) Draw the distribution function of X. 
(3) Determine the probabilities P(43 < X < 48) and P(X > 45). 


(4) Determine F(X), ,/Var(X) , and E(|X— E(X))). 


2.2) 114 nails are classified by length: 


number i 1 2 3 4 5 6 7 
length (in mm) x; <15.0 15.0 15.1 152 153 154 15.5 15.6 >15.6 
number of nails n; 0 3 10 25 40 18 16 2 0 


Let X denote the length of a nail selected randomly from this population. 

(1) Determine the probability distribution of X. 

(2) Determine the probabilities PX < 15.1), and P(15.0 <X< 15.5). 

(3) Determine E(X), m3 = E(X- E(X))3, o = Var(X) , yc, and yp. 

Interprete the skewness measures. 

2.3) A set of 100 coins from an ongoing production process had been sampled and 


their diameters measured. The measurement procedure allows for a degree of accur- 
acy of +0.04mm. The table shows the measured values x; and their numbers: 


i 1 2 3 4 5 6 7 
Xj 24.88 | 24.92 | 24.96 | 25.00 | 25.04 | 25.08 | 25.12 
Nj 2 6 20 40 22 8 2 


Let X be the diameter of a randomly from this set picked coin. 
(1) Draw the distribution function of X. 
(2) Determine E(X), E(\X- E(X)|), Var(X), and V(X). 
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2.4) 84 specimen copies of soft coal, sampled from the ongoing production in a col- 
liery over a period of 7 days, had been analyzed with regard to ash and water content, 
respectively [in %]. Both ash and water content have been partitioned into 6 classes. 
The table shows the results: 


water 
(16,17) | [17, 18) | [18, 19) | [19, 20) | [20, 21) | [21, 22] 
(23, 24) 0 0 1 1 2 4 
(24, 25) 0 1 3 4 3 3 
ash_ | [25, 26) 0 2 8 7 2 1 
(26, 27) 1 4 10 8 1 0 
(27, 28) 0 5 4 4 0 0 
(28, 29) 2 0 1 0 1 0 


Let X be the water content and Y be the ash content of a randomly chosen specimen 
copy out of the 84 ones. Since the originally measured values are not given, it is as- 
sumed that the values, which X and Y can take on, are the centers of the given classes, 
ie., 16.5, 17.5, +--+, 21.5. 


(1) Draw the distribution functions of X and Y. 
(2) Determine E(X), Var(X), E(Y), and Var(Y). 


2.5) It costs $50 to find out whether a spare part required for repairing a failed device 
is faulty or not. Installing a faulty spare part causes damage of $1000. 


Is it on average more profitable to use a spare part without checking if 
(1) 1% of all spare parts of that type, 

(2) 3% of all spare parts of that type, and 

(3) 10 % of all spare parts of that type are faulty? 


2.6) Market analysts predict that a newly developed product in design 1 will bring in 
a profit of $500 000, whereas in design 2 it will bring in a profit of $ 200 000 with 
probability 0.4, and a profit of $800 000 with probability 0.6. 


What design should the producer prefer? 


2.7) Let X be the random number one has to throw a die, till for the first time a 6 
occurs. Determine E(X) and Var(X). 


2.8) 2% of the citizens of a country are HIV-positive. Test persons are selected at 
random from the population and checked for their HIV-status. 

What is the mean number of persons which have to be checked till for the first time 
an HIV-positive person is found? 
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2.9) Let X be the difference between the number of head and the number of tail if a 
coin is flipped 10 times. 

(1) What is the range of X? 

(2) Determine the probability distribution of X. 


2.10) A locksmith stands in front of a locked door. He has 9 keys and knows that 
only one of them fits, but he has otherwise no a priori knowledge. He tries the keys 
one after the other. 


What is the mean number of trials till the door opens? 


2.11) A submarine attacks a warship with 8 torpedoes. The torpedoes hit the warship 
independently of each other with probability 0.8. Any successful torpedo hits one of 
the 8 submerged chambers of the ship independently of other successful ones with 
probability 1/8. The chambers are isolated from each other. In case of one or more 
hits, a chamber fills up with water. The ship will sink if at least 3 chambers are hit by 
one or more torpedos. What is the probability that the attack sinks the warship? 


2.12) Three hunters shoot at 3 partridges. Every hunter, independently of the others, 
takes aim at a randomly selected partridge and hits his/her target with probability 1. 
Thus, a partridge may be hit by several pellets, whereas lucky ones escape a hit. 


Determine the mean E(X) of the random number X of hit partridges. 


2.13) A lecturer, for having otherwise no merits, claims to be equipped with extra- 
sensory powers. His students have some doubt about it and ask him to predict the 
outcomes of ten flippings of a fair coin. The lecturer is five times successful. Do you 
believe that, based on this test, the claim of the lecturer is justified? 


2.14) Let_X have a binomial distribution with parameters n = 5 and p = 0.4. 
(1) Draw the distribution function of X. 
(2) Determine the probabilities 

P(X > 6), P(X <2), PB<X<7), P(X>3|X< 2), and P(X < 3|X= 4). 


2.15) Let X have a binomial distribution with parameters n = 10 and p. 
Determine an interval I so that P(X = 2) < P(X = 3) for all p € I. 


2.16) The stop sign at an intersection is on average ignored by 4% of all cars. A car, 
which ignores the stop sign, causes an accident with probability 0.01. Assuming inde- 
pendent behavior of the car drivers: 

(1) What is the probability that from 100 cars at least 3 ignore the stop sign? 

(2) What is the probability that at least one of the 100 cars causes an accident due to 
ignoring the stop sign? 


2 ONE-DIMENSIONAL RANDOM VARIABLES 109 


2.17) Tessa bought a dozen claimed to be fresh-laid farm eggs in a supermarket. 
There are 2 rotten eggs amongst them. For breakfast she boils 2 eggs. 


What is the probability that her breakfast is spoilt if already one bad egg will have 
this effect? 


2.18) A smart baker mixes 20 stale breads from the previous days with 100 freshly 
baked ones and offers this mixture for sale. Tessa randomly chooses 3 breads from 
the 120, i-e., she does not feel and smell them. What is the probability that she has 
bought at least one stale bread? 


2.19) Some of the 270 spruces of a small forest stand are infested with rot (a fungus 
affecting first the core of the stems). Samples are taken from the stems of 30 random- 
ly selected trees. 

(1) If 24 trees from the 270 are infested, what is the probability that there are less than 
4 infested trees in the sample? 

Determine this probability both by the binomial approximation and by the Poisson 
approximation to the hypergeometric distribution. 

(2) If the sample contains six infested trees, what is the most likely number of infest- 
ed trees in the forest stand (see example 2.7)? 


2.20) Because it happens that one or more airline passengers do not show up for their 
reserved seats, an airline would sell 602 tickets for a flight that holds only 600 pas- 
sengers. The probability that, for some reason or other, a passenger does not show up 
is 0.008. 


What is the probability that every passenger who shows up will have a seat? 


2.21) Flaws are randomly located along the length of a thin copper wire. The number 
of flaws follows a Poisson distribution with a mean of 0.15 flaws per cm. What is the 
probability ps» of at least 2 flaws in a section of length 10cm? 


2.22) The random number of crackle sounds produced per hour by an old radio has a 
Poisson distribution with parameter A = 12. 


What is the probability that there is no crackle sound during the 4 minutes transmis- 
sion of a listener's favorite hit? 


2.23) The random number of tickets car driver Odundo receives has a Poisson distri- 
bution with parameter A = 2 a year. In the current year, Odundo had received his first 
ticket on the 31st of March. 


What is the probability that he will receive another ticket in that year? 


2.24) Let X have a Poisson distribution with parameter 1. 
For which nonnegative integer 7 is the probability py = P(X =n) maximal? 
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2.25) In 100 kg of a low-grade molten steel tapping there are on average 120 impu- 
rities. Castings weighing |kg are manufactured from this raw material. What is the 
probability that there are at least 2 impurities in a casting if the spacial distribution of 
the impurities in the raw material is assumed to be Poisson? 


2.26) In a piece of fabric of length 100m there are on average 10 flaws. These flaws 
are assumed to be Poisson distributed over the length. The 100m of fabric are cut in 
pieces of length 4m. 


What percentage of the 4m cuts can be expected to be without flaws? 


2.27) X have a binomial distribution with parameters n and p. Compare the following 
exact probabilities with the corresponding Poisson approximations and give reasons 
for possible larger deviations: 

(1) P(X=2) for n= 20, p=0.1, 

(2) P(X=2) for n= 20, p=0.9, 

(3) P(X=0) for n= 10, p=0.1, 

(4) P(X=3) for n= 20, p=0.4. 


2.28) A random variable X has range R = {x1,x2,-+*,Xm} and probability distribution 
(pe= POS xp) R= 1,255, Dey peel. 


A random experiment with outcome X is repeated n times. The outcome of the kth 
repetition has no influence on the outcome of the (k+1)th one, k=1,2,...,.m—1. 
Show that the probability of the event 


{x1 occurs n; times, x2 occurs 12 times, --:, Xm Occurs Nm times} 


is given by 


n!} ny ng Nm 


7 Py Py ***Pm' with Dei ng= 1. 


This probability distribution is called the multinomial distribution. It contains as a 
special case the binomial distribution (n = 2). 


2.29) A branch of the PROFIT-Bank has found that on average 68% of its customers 
visit the branch for routine money matters (type 1-visitors), 14% are there for invest- 
ment matters (type 2-visitors), 9% need a credit (type 3-visitors), 8% need foreign 
exchange service (type 4-visitors), and 1% only make a suspicious impression or 
even carry out a robbery (type 5-visitors). 


(1) What is the probability that amongst 10 randomly chosen visitors 5, 3, 1, 1, and 0 
are of type 1, 2, 3, 4, or 5, respectively? 
(2) What is the probability that amongst 12 randomly chosen visitors 4, 3, 3, 1, and 1 
are of type 1, 2, 3, 4, or 5, respectively? 
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Section 2.3 


2.30) Let F(x) and f(x) be the respective distribution function and the probability 
density of a random variable XY. Answer with yes or no the following questions: 


(1) F(x) and f(x) can be arbitrary real functions. 

(2) f(x) is a nondecreasing function. 

(3) f(x) cannot have jumps. 

(4) f(x) cannot be negative. 

(5) F(x) is always a continuous function. 

(6) F(x) can assume values between —1 and +1. 

(7) The area between the abscissa and the graph of F(x) is always equal to 1. 
(8) f(x) must always be smaller than 1. 

(9) The area between the abscissa and the graph of f(x) is always equal to 1. 
(10) The properties of F(x) and f(x) are all the same to me. 


2.31) Check whether by suitable choice of the parameter a the following functions 
are densities of random variables. If the answer is yes, determine the respective dis- 
tribution functions, mean values, variances, medians, and modes. 


(1) f@) =alx|, -3 <x < 43, 
(2) f(x) = axe™ , x>0, 
(3) f(x) =a sinx, 0O<x<q7, 
(4) f%) =acosx, O<x< nt. 
2.32) (1) Show that f(x) = ioe 0<x<1, isa probability density. 
x 
(2) Draw the graph of the corresponding distribution function and determine the cor- 
responding 0.1, 0.5, and the 0.9-percentiles. Check whether the mean value exists. 


2.33) Let X be a continuous random variable. Confirm or deny the following state- 
ments: 


(1) The probability P(X = E(X)) is always positive. 

(2) There is always Var(X) < 1. 

(3) Var(X) can be negative if X can assume negative values. 
(4) E(X) is never negative. 


2.34) The current which flows through a thin copper wire is uniformly distributed in 
the interval [0, 10] (in mA). For safety reasons, the current should not fall below the 
crucial level of 4mA. 


What is the probability that at any randomly chosen time point the current is below 
4mA? 
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2.35) According to the timetable, a lecture begins at 8:15 a.m. The arrival time of 
Professor Wisdom in the venue is uniformly distributed between 8:13 and 8:20, 
whereas the arrival time of student Sluggish is uniformly distributed over the time 
interval from 8:05 to 8:30. 


What is the probability that Sluggish arrives after Wisdom in the venue? 


2.36) A road traffic light is switched on every day at 5:00 a.m. It always begins with 
red and holds this colour for two minutes. Then it changes to yellow and holds this 
colour for 30 seconds before it switches to green to hold this colour for 2.5 minutes. 
This cycle continues till midnight. 


(1) A car driver arrives at this traffic light at a time point which is uniformly distri- 
buted between 9:00 and 9:10 a.m. What is the probability that the driver catches the 
green light period? 

(2) Determine the same probability on condition that the driver's arrival time point has 
a uniform distribution over the interval [8:58, 9:08]. 


2.37) A continuous random variable X has the probability density 


1/4 for 0<x <2, 


re | 1/2 for 2<x <3. 


Determine Var(X) and E(|X—-E(X))|). 


2.38) A continuous random variable X has the probability density 
f(x) =2x, O<Sx<1. 

(1) Draw the corresponding distribution function. 

(2) Determine and compare the measures of variability 


E(\X—E(X)|) and [Var(X) . 


2.39) The lifetime X of a bulb has an exponential distribution with a mean value of 
E(X) = 8000 hours. Calculate the probabilities 
P(X < 4000), P(X > 12000), P(7000 < X < 9000), and P(X < 4000) 


(time limits in hours). 


2.40) The lifetimes of 5 identical bulbs are exponentially distributed with parameter 
A= 1.25- 104 [Ag]. 
All of them are switched on at time ¢= 0 and will fail independently of each other. 


(1) What is the probability that at time ¢= 8000 hours a) all 5 bulbs and b) at least 3 
bulbs have failed? 


(2) What is the probability that at least one bulb survives 12 000 hours? 
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2.41) The period of employment of staff in a certain company has an exponential 
distribution with property that 92% of staff leave the company after only 16 months. 
What is the mean time an employee is with this company and the corresponding stand- 
ard deviation? 


2.42) The times between the arrivals of taxis at a rank are independent and have an 
exponential distribution with parameter A = 4 [A7!]. An arriving customer does not 
find an available taxi and the previous one left 3 minutes earlier. No other customers 
are waiting. What is the probability that the customer has to wait at least 5 minutes 
for the next free taxi? 


2.43) A small branch of a bank has the two tellers 1 and 2. The service times at these 
tellers are independent and exponentially distributed with parameter A = 0.4 [min7!]. 
When Pumeza arrives, the tellers are occupied by a customer each. So she has to wait. 
Teller 1 is the first to become free, and the service of Pumeza starts immediately. 


What is the probability that the service of Pumeza is finished sooner than the service 
of the customer at teller 2? 


2.44) Four weeks later Pumeza visits the same branch as in exercise 2.43. Now the 
service times at tellers 1 and 2 are again independent, but exponentially distributed 
with respective parameters 4, =0.4[min7!] and 42 = 0.2 [min7!]. 

(1) When Pumeza enters the branch, both tellers are occupied and no customer is wait- 
ing. What is the mean time Pumeza spends in the branch till the end of her service? 


(2) When Pumeza enters the branch, both tellers are occupied, and another customer 
is waiting for service. What is the mean time Pumeza spends in the branch till the end 
of her service? (Pumeza does not get preferential service.) 


2.45) An insurance company offers policies for fire insurance. Achmed holds a poli- 
cy according to which he gets full refund for that part of the claim which exceeds 
$3000. He gets nothing for a claim size less than or equal to $ 3000. The company 
knows that the average claim size is $5642. 


(1) What is the mean refund Achmed gets from the company for a claim if the claim 
size is exponentially distributed? 

(2) What is the mean refund Achmed gets from the company for a claim if the claim 
size is Rayleigh-distributed? 


2.46) Pedro runs a fruit shop. Mondays he opens his shop with a fresh supply of straw- 
berries of s pounds, which is supposed to satisfy the demand for three days. He knows 
that for this time span the demand _X is exponentially distributed with a mean value 
of 200 pounds. Pedro pays $2 for a pound and sells it for $4. So he will lose $2 for 
each pound he cannot sell, and he will make a profit of $2 out of each pound he sells. 


What amount s =s* of strawberries Pedro should stock for a period of three days to 
maximize his mean profit? 
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2.47) The probability density function of the random annual energy consumption XY 
of an enterprise [in 108kw/] is 

f(x) = 30(x —2)?[1 —2(x—-2)+(x-2)7], 2<x<3. 
(1) Determine the distribution function of X. What is the probability that the annual 
energy consumption exceeds 2.8? 


(2) What is the mean annual energy consumption? 


2.48) The random variable X is normally distributed with mean up =5 and standard 
deviation o = 4. 


Determine the respective values of x which satisfy 
P(X <x) =0.5, P(X> x) =0.95, P(x <X<9)=0.2, P3B<X<x)=0.95, 
P(-x <X< +x) = 0.99. 


2.49) The response time of an average male car driver is normally distributed with 
mean value 0.5 and standard deviation 0.06 (in seconds). 

(1) What is the probability that his response time is greater than 0.6 seconds? 

(2) What is the probability that his response time is between 0.50 and 0.55 seconds? 


2.50) The tensile strength of a certain brand of paper is modeled by a normal distribu- 
tion with mean 24 psi and variance 9 [psi]. 

What is the probability that the tensile strength of a sample does not fall below the 
critical level of 20 psi? 


2.51) The total monthly sick leave time of employees of a small company has a nor- 
mal distribution with mean 100 hours and standard deviation 20 hours. 

(1) What is the probability that the total monthly sick leave time will be between 50 
and 80 hours? 

(2) How much time has to be budgeted for sick leave to make sure that the budgeted 
total amount for sick leave is only exceeded with a probability of less than 0.1? 


2.52) The random variable X has a Weibull distribution with mean value 12 and vari- 
ance 9. 


(1) Calculate the parameters B and 0 of this distribution. 
(2) Determine the conditional probabilities PX > 10|X > 8) and P(X < 6|X > 8). 


2.53) The random measurement error _X of a meter has a normal distribution with 
mean 0 and variance o2, i.e., X= NO, 2). It is known that the percentage of meas- 
urements, which deviate from the 'true' value by more than |0.4|, is 80%. Use this 
piece of information to determine o. 
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2.54) If sand from gravel pit 1 is used, then molten glass for producing armored glass 
has a random impurity content X which is M(60, 16)-distributed. But if sand from 
gravel pit 2 is used, then this content is N(62,9)-distributed (u and o in 0.01%). The 
admissable degree of impurity should not exceed 0.64%. 


Sand from which gravel pit should be used? 


2.55) Let X have a geometric distribution with 
P(X=i)=(1—-p)p'; i=0,1,..3 0<p<1. 

By mixing these geometric distributions with regard to a suitable structure distribution 
density f(p) show that 

I 1 

———~ = l. 

2 (i+ I)G+2) 

2.56) A random variable X has distribution function 
F(x) =e; a>0, x>0 

(Frechét distribution). 


What distribution type arises when mixing this distribution with regard to the expo- 
nential structure distribution density f(a) =A e**; 7A>0, a>0? 


2.57) The random variable X has distribution function (Lomax distribution, page 93) 


Ca evar x>0. 


Check whether there is a subinterval of [0, 00) on which F(x) is DFR or JFR. 


2.58) Check the aging behavior of systems whose lifetime distributions have 
(1) a Frechét distribution with distribution function F(x) = e~G”, x > 0 (sketch its 
failure rate), and 


(2) a power distribution with distribution function F(x) = 1 - (1/x?), x21. 
respectively? 


2.59) Let F(x) be the distribution function of a nonnegative random variable X with 
finite mean value pL. 


(1) Show that the function F's(x) defined by 


1 
Fox) = q fg -FO)at 
is the distribution function of a nonnegative random variable X;. 


(2) Prove: If X is exponentially distributed with parameter 1 = 1/u, then so is Xs and 
vice versa. 


(3) Determine the failure rate As(x) of Xs. 
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2.60) Let X be a random variable with range {1, 2, ...} and probability distribution 


9 pas eee 
PSS ese eer 


Determine the z-transform of X and by means of it E(Y), E(X 2), and Var(X). 


2.61) Determine the Laplace transform f(s) of the density of the Laplace distribution 
with parameters A and ut (page 66): 


f= the MH), —c0< x < 400, 


By means of f(s) determine E(X), E(X7), and Var(X). 


CHAPTER 3 


Multidimensional Random Variables 


The previous chapter essentially dealt with one-dimensional random variables and 
their probabilistic characterization and properties. Frequently a joint probabilistic 
analysis of two or more random variables is necessary. For instance, for weather 
predictions the meteorologist must take into account the interplay of randomly fluc- 
tuating parameters as air pressure, temperature, wind force and direction, humidity, 
et cetera. The operator of a coal power station, in order to be able to properly 
planning the output of the station, needs to take into account outdoor temperature as 
well as ash and water content of the coal presently available. These three parameters 
have a random component and there is a dependency between ash and water content. 
The information technologist, when analyzing stochastic signals, has jointly to 
consider their random phases and amplitudes. The forester, who has to estimate the 
amount of wood in a forest stand, measures both height and stem diameter (at a 
height of 1.3m) of trees. Even in chapter 2 of this book vectors of random variables 
occurred without having explicitely hinted to this: When a die is tossed twice, then 
the outcome is (X;,X2). The binomial distribution is derived from a sequence of n 
binary random variables (X1,X2,...,Xn). More challenging situations will be dis- 
cussed in Part II of this book: Let, for instance, X(¢) be the price of a unit of stock at 
time ¢ and 0<f, <t) <---<ty,. Then the components of the n-dimensional vector 
(X(t), X(t2), -.-. X(tn)) are the random stock prices at time points ¢;. There is an 
obvious dependency between the X(t;) so that for the prediction of the stock price 
development in time the random variables X(¢;) should not be analyzed separately of 
each other. The same refers to other time series as registering temperatures, popula- 
tion sizes, et cetera, at increasing time points. 


3.1 TWO-DIMENSIONAL RANDOM VARIABLES 


3.1.1 Discrete Components 


Let Xand Y be two random variables, which are combined to a random vector (X, Y). 
This vector is also called a two-dimensional random variable or a bivariate random 
variable. In this section, X and Y are assumed to be discrete random variables with 
respective ranges Ry = {xg,x1,...} and Ry = {yo,y1,...}. Then the range of (X, Y) is 
the set of two-dimensional vectors 


Ryy= {@,y), XE Ry,y € Ry}. 
The (deterministic) vector (x,y) is called a realization of (X, Y). 
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For instance, if two dice are thrown simultaneously and the outcomes are X and Y, 
respectively, then the range of (X, Y) is 


Ryy ={@/); iy = 1,2,..., 6}. 
If X and Y are the random number of traffic accidents occruring a year in the two 
neighboring towns Atown and Betown, respectively, then 
Ry= {0,1,...} and Ry = {0,1,...}, 
and the range of (X, Y) is Ryy = {(i,/), £7 = 0, 1,2,...}. It makes sense to consider XY 


and Y together, since weather, seasonal factors, vacation periods, and other condi- 
tions induce a dependency between X and Y. 


Joint probability distribution Let 
{pi =P(X=x;; 1=0,1,...} and (qj =P(V=x;; j=0,1,...} 
be the probability distributions of X and Y, respectively. Furthermore, let 
rigp=P(X=xj VY=y;) forall (x;,¥;) € Rxy (3.1) 
be the probabilities for the joint occurrence of the random events 'X = x;' and 'Y= yj." 
The set of probabilities 
{rij £7 =0,1,...} (3.2) 
is the joint or two-dimensional probability distribution of the random vector (X, Y). 
From the definition of the r;;, 


Pi=Ljory, y= Laory- (3.3) 


Marginal Distributions The probability distribution {p;,i=0,1,...} of X and the 
probability distribution {q;, i= 0,1,...} of Y are called the marginal distributions of 
(X, Y). The marginal distributions of (X, Y) do not contain the full information on the 
joint probability distribution of (X, Y) if there is a dependency between X and Y. How- 
ever, if X and Y are independent, then the joint probability distribution of (X, Y) and 
its marginal distributions are equivalent in this regard. 


Definition 3.1 (independence) Two discrete random variables X and Y are (statisti- 
cally) independent if 


rig =Pid;, tj =9,1,.... e 
If X and Y are independent, then the value, which _X has assumed, has no influence on 
the value, which Y has assumed and vice versa. This is the situation when throwing 
two dice simultaneously, or when X denotes the number of shark attacks at humans 


occurring at the shores of South Africa in 2025 and Y the ones at the shores of 
Hawaii in 2030. The mean value of the product XY is 


E(XY) = Dies Lye rig Xi}. (3.4) 
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For independent X and Y, the mean value of XY becomes 
E(XY) = Digi Dia pigjXixj = (D1 Pix) (Va Gy) 
so that E(XY) = E(X)- E(Y). (3.5) 


Conditional Probability Distribution By formula (1.22), the conditional probabili- 
ties of X=x; givenY=y, and Y=y; given X=x;, respectively, are 
rij 


ri; 
PX=x|Y=y)=7, PV=yi|X=x) = 5, 


The sets 


cee [3 = 
{Ze i=0,1,..| and pF = 91 
are the conditional probability distributions of X given Y=; and of Y given X=x;, 
respectively. The corresponding conditional mean values are 
oo rij ce) Vij 
BUY =yj)= 2% xg BUX = 34) = B yj 57 
i= ? j= 


If X and Y are independent, then the conditions have no influence on the respective 
mean values, since r;;/q; =p; and r;;/p;=q; (see formula 2.7): 


E(X|Y=y))=EQQ, E(Y|X=x,) = E(¥); if =0,1,.... 


The conditional mean value E(X|Y) of X given Y is a random variable, since the con- 
dition is random. The range of E(X|Y) is 


{E(X|Y=yo), EX|IY =), «..}, 
and the mean value of E(X|Y) is E(X), since 


EEQIY)) = Lfso BMY = y) PW = 9) = Deo Dikoxi Gg, 


= Dico XLpeo ry = Dic XiPi = EX). 
Because the roles of X and Y can be exchanged, 
E(E(X|Y)) = E(X) and E(E(Y|X)) = E(Y). (3.6) 


Example 3.1 Two dice are thrown. The outcomes are X; and X>, respectively. Let 
X= max(X1,X2) and Y= 'total number of even figures in (X1,X9).' 

The ranges of X and Y are Ry = {1,2,3,4,5,6} and Ry = {0,1,2}. Since X; and X2 

are independent, 


P(X, =i, Xp =/) = P(X) =i) P(X =/) = ; 


es 
6 6 36° 


By (3.6), the q; and the p; are the corresponding row and column sums in Table 3.1. 
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xX 1 2 3 4 5 6 qj 
Y 
0 1/36 0 3/36 0 5/36 0 9/36 
1 0 2/36 2/36 4/36 4/36 6/36 18/36 
2 0 1/36 0 3/36 0 5/36 9/36 
Pi 1/36 3/36 5/36 7/36 9/36 11/36 1 


Table 3.1 Joint distribution and marginal distribution for example 3.1 


The mean values of X and Y are 
E(X) = (1 42-343-544:745-9+6-11) 24.472, 
E(Y) = -(0-9+1-18+2-9)=1. 


X and Y are not independent of each other: If Y= 1, then Y=1.If ¥=2, then Y can 
only be 1 or 2 and so on. Hence, it makes sense to determine the conditional distri- 
butions, e.g. 


ee 
| Px ilY=/) qt 1,2,...,61; j= 0,1,2. 


j=0: {2,0,3,0,3,0}, BUxY=0)= 2 ~ 3.889. 
j=l: {Q22223) saxiysp=t24.556, 
\, E(X|Y = 2) = 4 ~ 4.889. oO 


3.1.2, Continuous Components 


3.1.2.1 Probability Distribution 
Let X and Y be continuous, real-valued random variables with distribution functions 
Fy(x) = P(X Sx), Fy) =P(Y<y) 


and ranges Ry, Ry, respectively. As with discrete random variables X and Y, (X, Y) is 
called a random vector, a two-dimensional random variable, or a bivariate random 
variable. Analogously to the distribution function of a (one-dimensional) random 
variable, there is a function, which contains the complete probabilistic information 
on (X, Y). This is the joint distribution function Fy y(x,y) of X and Y defined by 


Fy yxy) =P(XS x, Y<y), x eRy,y e Ry, 


where 'X <x, Y<y' = 'X¥<xOY<y.' (For discrete random variables X and Y the 
joint distribution function is defined in the same way.) To discuss the properties of 
the joint distribu- tion function, it can be assumed without loss of generality that 
Ry=Ry= (—90, +00). 
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Fy. y(x,y) has the following properties: 


(1) Fyy(-%,y) = Fy y(x,-0) =0, Fy y(+, +00) = 1. 
(2) O< Fyy(x,y) <1. 
(3) Fy y(x, +) =F’ x(x), Fy y(+0,y) =F y(y). 


(4) For x; <x 2 and yj <yo, 
Fy y%1,¥1) S$ Fx y%2,1) S$ Fx y2,¥2); 
Fy y(%1,¥1) $ Fx y1,¥2) S$ Fx, y2,y2). 


Thus, Fy y(x,) is nondecreasing in every argument. 


(5) P(X>x, Ys y)=Fy(y)-Fyy(,y). 
(6) PUX Sx, Y>y) =F x(x) -F yy, y). 
(7) PUX>x,Y>y)=1-Fy(y)-F xv) + Fry, y). 


A generalization of the formula (2.44) to random vectors (X,Y) is 
Pia<X<b,c<Y<d)= [Fx y(5, d) -Fyy(6,c)] - [Fx y(a,d) -Fyy(a, c)]. (3.7) 


Any function F(x, y), which has properties (1) and (4) and is continuous on the left in 
x and y is the joint distribution function of a random vector (X, Y) if, in addition, the 
right-hand side of (3.7) is nonnegative for all a, b and c, d with a<b and c <d (see 
exercise 3.17). Properties (5) — (7) are implications of properties (1) and (4). For in- 
stance, to prove (5), the random event 'X > x, Y<y' is equivalently represented as' 
Ysy'\X <x, Y<y'. Hence, by formula (1.14), 

P(X>x, YS y)=P(¥Sy)-P(XSx,Y Sy) =F yy)-Fxy,y). 


Property (6) follows from (5) by changing the roles of X and Y. Property (7) is a 
special case of formula (3.7) (see exercise (3.16) for a proof of formula (3.7)). 


Note Properties (1) to (7) also are true for random vectors with discrete components. 


The probability distribution functions of X and Y are the marginal distribution func- 
tions of the two-dimensional random variable (X,Y), and the pair (Fy, Fy) is the 


marginal distribution of (X, Y). 


Joint Probability Density Assuming its existence, the partial derivative of Fy y(x,y) 
with respect to x and y, 


OF y,y(x,y) 
Sxy@,y) = Ox dy ’ (3.8) 


is called the joint (probability) density of (X,Y). Equivalently, the joint density can 
be defined as a function fy y(x,y) satisfying 


Fyyay =] Po fer @v)dudv, -0<x,y< +0. (3.9) 
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Every joint (probability) density has the two properties 


fyony)20, [7% [TE fx yeay)dedy = 1. (3.10) 
Conversely, any function of two variables x and y satisfying these two conditions can 
be considered the joint density of a random vector (X, Y). From property (3) of the 
previous page and formula (3.9) one obtains the marginal densities of (X, Y) in terms 
of the joint density: 


frO=l fered, fro) =| fryey)dr. (3.11) 


Analogously to discrete random variables, the marginal distribution {F'y,F'y} or, in 
terms of the densities, {fy(x), fy(v)}, does not contain the full information on the 
joint probability distribution of (X, Y) as given by Fy y(x,y) if there is a (statistical) 
dependency between X and Y. If X and Y are independent, then Fy y(x,y) and its 
marginal distribution {F'y, Fy} are equivalent in this regard: 


Definition 3.2 (independence) Two random variables X and Y are independent if 
Fy y(%y) = Fx) - Fy(y). e 

Remark For discrete random variables this definition of independence is equivalent to the one 

given by definition 3.1. Representations of the distribution functions of discrete random varia- 


bles are given at page 43. 


In terms of the densities, X and Y are independent if and only if 


Fy yOy) =fx) -SyO). (3.12) 
The mean value of XY is 
E(XY) = [J xv flay) dxdy. (3.13) 
As with discrete random variables (formula 3.5), for independent random variables: 
E(XY) = E(X)- E(). (3.14) 


Although in many applications the independence assumption is not justified, analyti- 
cal results can frequently only be derived under this assumption. A reason for this 
situation is, apart from mathematical challenges, the inherent difficulties the analyst 
faces when trying to quantify statistical dependency. 

Let Raxay be a rectangle with sufficiently small side lengths Ay and Ay. Then the 
random vector (X, Y) assumes a realization from this rectangle approximately with 
probability 

P(X, Y) € Raxdy) © fyyoy) Avdy. 


More generally, if B is an area in the plane, then the probability that the vector (X, Y) 
assumes a realization from B is given by the surface integral 


P((X,Y) € B)= I fy(x,y) dx dy. (3.15) 
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Figure 3.1 Normal regions with regard to the x-axis a) and the y-axis b) 


For anormal region with regard to the x-axis 
B= {asx<b, y\(x)<y<y2(x)} 
(Figure 3.1a), the surface integral (3.15) can be calculated by the double integral 


PUX,¥) B=? (P2 fy roay)dy) de 3.16) 


For anormal region with regard to the y-axis 


B= {x|(*%) Sx <x2(%), c<y<d} 
(Figure 3.1b), the surface integral (3.15) can be calculated by the double integral 
PUX,Y) < By=J4 (f2 facroay) de) dy. 
Double integrals can frequently be more efficiently calculated by transition from the 
Cartesian coordinates x and y to curvilinear coordinates u and v: 
u=u(x,y), v=v(x,y) or x=x(u,v), y=y(u, Vv). 
Then the normal region B with regard to e.g. the x-axis is transformed to a region B’: 
Bis {a’ <usb/,vi(u)<v< va(u)}, 


and the double integral (3.16) becomes 


_ (0! { pro) O(x, ¥) 
[res rraeay = [2 (2G) fave vty] Tava, B17) 
where 
Ox Oy 
Oxy) _ ou Ou 
O(u, v) ox Oy 
Ov Ov 


is the functional determinant of the transformation. 
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IfB=[a<X<b,c<Y<d], then (3.16) becomes 


b( td 
PUX,Y) € B)=)°(J4 fvexsydy) dx, 
This integral easily implies formula (3.7). 


Figure 3.2 Integration region for example 3.2 


Example 3.2 The joint probability density of the random vector (X, Y) is 
fxyy) = e MY), x>0, y>0. 
(1) The corresponding marginal densities are 
Sx) = i e OY) dy=e*, fy(y)= le ePWdy =e; x, y>0. 


Thus, X and Y are both exponentially distributed with parameter 4 = 1. Moreover, 
since e C+”) = e-*. e-’, Xand Y are independent. 


(2) Let B = {|Y—X| <1}. The region B is hatched in Figure 3.2. The lower bound for 
Bisy=0 if0<x<1 and y=x-1 if 1 <x. The upper bound is y=x+1 ifx20. 
Therefore, the outer integral of formula (3.16) has to be split with regard to the x-in- 
tervals [0, 1] and [1,0) : 


P(\Y-X| < 1) =I5 (is eT ty) dy) dx+|? ise eT ty) dy) dx 
= I el - eer!) | dx + ie ere) - et) Jd 
=1-l/e. 
Hence, P(|Y—X| < 1) = 0.632. oO 
Example 3.3 Let 
Ixy) = Fx, O<x<yK<2. 


(1) Show that fx y(x,y) is a joint probability density. 
(2) Determine the probability P(X? > Y). 
(3) Are X and Y independent? 
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Figure 3.3 Possible (shaded) and favorable (hatched) region for (X,Y) (example 3.3) 


(1) It needs to be shown that the conditions (3.10) are fulfilled. f(x,y) is obviously 


nonnegative. Further, 
Jo Uva )av= [5 ([f gave) ax 


7 £ [§(Qx-33/2) dx = +[ x? a88'|; = 


(2) In Figure 3.3 the possible set of realizations of (X, Y) is shaded, and the region B 
for which Y? >.X is hatched. The upper bound of B is given by the parabola y = x? 
between x= 1 andx= J2 and the straight line y = 2 between x = /2 and x =2. The 
lower bound of B is the straight line y = x between x = 1 and x = 2. Hence, the desired 
probability is 


PX? > Y= f2? (a Leydy)de+ [rp (/? 1 eydy)d 
=1jv? (x5 —x saves We (4x- 3) dx 


=1(8_4 i+ +8 4 441). 


Thus, P(X? > Y) ~ 0.354. 
(3) The marginal densities fy(x) and fy(v) are 


Ix(x) = fF axydy= Lyf] =2(4r 3), O0<x<2. 


1 l 217 4 
fy) = pexvdy= ty S| =a7, O<y<2. 


Since 
Sx,ysy) #fx(x) fr), 
X and Y are not independent. O 
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Two-Dimensional Uniform Distribution The random vector (X, Y) has a uniform 
distribution in a finite region B of the (x, y)-plane with positive area u(B) if 


fOD= Ty (x,y) € B. 


Outside B the joint density f(x,y) is 0. The conditions (3.10) are fulfilled since 
x,y) dxd dx dy =1. 
YA »y) dx dy = re (By © x dy = oy! ——~ [J dxdy = 


For any A CB the probability that (X, Y) assumes a value from A is 
(A) 
PUL Y) €A)=—_ 
1(B) 
Remark The uniform distribution of a random vector in a plane is identical to the geometric 
distribution introduced in section 1.3.2 (formula (1.8)) if Q is a finite subset of a plane. 


1000 


Figure 3.4 Possible and favorable region for example 3.4 


Example 3.4 Let X be the daily power production of a power station, and let Y be the 
daily demand of the consumer. The random vector (X, Y) has a uniform distribution 
over the region 


B= {900 <x < 1000, 850 <y < 950}. 
What is the probability that the demand exceeds the supply? 
The possible realizations of the random vector are in the shaded region (region B) of 
Figure 3.4. Its area is 10 000. Hence, the joint density of (X, Y) is 
1 
xy, y) = 10000” (x,y) eB. 
The subregion of B, where Y > X, is the hatched part of B. Its lower bound is the 
straight line y = x. Hence, the desired probability is 


_ 7950 7950 1 
PW>X)=Jooo), roma VE = 


which works out to be P(Y > X) = 0.125. 


Of course, no integration is required to arrive at this result, since the area of the hatch- 
ed part is a half of the area of a square with side length 50. oO 


30p(950 x) dx, 


Te 
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Theorem 3.1 (1) If X and Y are independent and in the respective intervals [a, b] and 
[c,d] uniformly distributed, then the random vector (X, Y) has a uniform distribution 
on the rectangle 

B={asx<b,c<y<d}. 
(2) Conversely, if (X, Y) has a uniform distribution on the rectangle B, then the ran- 
dom variables X and Y are independent and uniformly distributed in the intervals 
[a,b] and [c, d], respectively. 


Proof (1) If X is uniformly distributed in [a,b] and Y in [c,d], then 


Fyfx) =F", a<x<b, 


=1C. 
FyQ)=7—, csy<d. 


Hence, by definition 3.2, the joint distribution function of (X, Y) is 


(x-a)(y—c) 


G=adacy ee 


Fy y(x,y) = 


The corresponding joint density is 
_ OF (X,Y) _ 1 
RYO) = “Gray = Gado 


(x,y) is the joint density of a random vector (X, Y), which is uniformly distributed 
on the rectangle B. 


(x,y) € B. 


(2) If (X, Y) is uniformly distributed in the rectangle B, then its corresponding mar- 
ginal densities are 


fx (x) = [ofey @»)dy = eagea — a<x<b, 
fron=Spfareonde= |) Ga aaa t= Gap CSSA 


so that fy y(x,y) =fx(x) -fy(v). Hence, X and Y are independent and uniformly distri- 
buted in the intervals [a,b] and [c,d], respectively. a 


3.1.2.2 Conditional Probability Distribution 


Given a random vector (X, Y), the conditional distribution function of Y given X= x 
and the corresponding conditional density of Y given X =x are denoted as 


FyQ|x)=P(¥Sy|X=x),  fyrlx) = dF y0|x/dy. 


For continuous random variables, the event 'X = x' has probability 0 so that the defini- 
tion of the conditional probability by formula (1.22) cannot directly be applied to 
deriving F'y(y|x). Hence, consider for a Ax > 0 the conditional probability 
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PYsynxsXsxt+Ax) 
P(xsX<x+Ax) 


ee, et ee v)du)dy 
aelF xe + Ax) - Fx(x)] 
If Ax > 0, then, assuming fy(x) > 0, 
y 
Fyyb)= 76 JS xray. (3.18) 
Differentiation yields the desired conditional density: 


fo 


Fx) 
By (3.12), if X and Y are independent, then 
fyolx) =fy0). 


The conditional mean value of Y given X =x is 


E(Y\x) = J" yfyOlx) dy. (3.20) 


The function my(x) = E(Y|x) is called regression function of Y with regard to x. It 
quantifies the average dependency of Y from X. For instance, if X is the body weight 
and Y the height of a randomly chosen member from a population of adults, then 
my(x) is the average height of a member of this population with body weight x. Or: 
the difference my(x+Ax)—my(x) is the mean increase in body height if the body 
weight increases from x to x + Ax. 


PUY sylx<X<xt+Aryj= 


(3.19) 


The conditional mean value of Y given X is 


B(Y1X) =|" vfrOlX) dy. 
E(Y|X) is a random variable with property 
E(E(Y|X)) = E(Y). (3.21) 
This is proved as follows: 
E(E(Y|X)) = J* Zev fyvvlx) dy fixer) dc 


+00 +00 


-f pe dy fx(x) dx = J | J vfeeydyar, 


Hence, by (3.11), 

E(E(Y|X)) =|" vf) dy = E(Y). 
If X and Y are independent, then 

E(Y|X = x) = E(Y|X) = E(Y). (3.22) 
Clearly, the roles of X and Y can be exchanged in the formulas (3.18) to (3.22). 
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Formula (3.21), applied to the representation (2.62) of the variance (page 67), can be 
used to derive a conditional variance formula for Var(X) (exercise 3.21): 


Var(X) = E[Var(X|¥)] + Var[E(X1Y)]. (3.23) 


Example 3.5 The random vector (X, Y) has the joint probability density 
Px yOy)=x+y, OSxySl. 


fx,y(%,y) is nonnegative at the unit square. The marginal densities are 
fxr) = foarte = Ly +y?/2]4 =x+1/2, 0<x<l, 
fy) = [yet yde=l2/2+yx]) =yt 12, OSyS< 1. 


Since fy y(x,y) # fx(x) -fy(v), the random variables X and Y are not independent. 
(Give an intuitive explanation for this.) The mean value of X is 


E(X) = i x(x + 1/2) dx = [x3/3 +x2/4]4 = 5 = 0.5833. 
In view of the symmetry between x and y in fy y(x,y), 
E(Y) = 5 = 0.5833. 
By (3.19), the conditional nae of Yon condition X= x is 


4 = ib : 0O<x < 
frvlx) = ae 2x +1? De 


The regression function m y(x) = E(Y|X =x) of Y with regard to x is 


my(x) = 2|y aya 52 ea fox +9 


1 
tea 
2x+1} 2 3 0 


so that 
my(x) = ae O<x<l. 
In particular, 
m y(0) = 5 =~ 0.6667, my(1)= 2 0.5556, my(0.5) = 7 = E(Y) = 0.5833. 


The relatively small influence of the conditions at the conditional mean values sug- 
gests that the dependency between X and Y is not that strong (Figure 3.5). The condi- 
tional mean value of Y given X is the random variable 
_ 24+3X 
Eu 34+6X’ 
which has mean value E(Y) = 7/12. O 
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Example 3.6 
i 


Example 3.5 


0 | | L | 
0 02 04 06 0.8 [en 


Figure 3.5 Regression functions for examples 3.5 and 3.6 


Example 3.6 The random variable Y has probability density 


fy”) =3y”, OS yK<1. 
On condition Y=y, the random variable X is uniformly distributed in [0,y], y > 0. 
(1) What is the joint probability density of the random vector (X, Y)? 
(2) Determine the conditional mean values E(Y|X =x) and E(X|Y=y). 


(1) On condition Y=y with y>0 the density of X is 
felely) = 4, O<x<y. 
Hence, by formula (3.19), the joint density of (X, Y) is 
fey) = fue) Sv) = 5 3y? =3y, OSxSy <1. 


The (unconditional) density of X one obtains from (3.11): 
271 
f(x) =f} 38ydy =3 | =1.5(1—x2), 0<x<1. 
0 


(2) The regression function m y(x) = E(Y|x) of Y with regard to X =x is 


mao= [v4 ie 


_21-x3 
2 19?" 


i y2 
dy=2 d 
. lr 4 


O0<x<l. 
The conditional mean value of X given Y=y is 


—0O 


+00 y 
E(Xly)= J xfxGly) dx a ax 


=0.5y, O<x<y. O 


3 MULTIDIMENSIONAL RANDOM VARIABLES 131 


3.1.2.3 Bivariate Normal Distribution 


The random vector (X,Y) has a bivariate (2-dimensional) normal or a bivariate 
(2-dimensional) Gaussian distribution with parameters 


Ux, Hy, Ox, Sy and p, —-©<LUx,Hy<%, Ox >0,0),>0, -l<p<l 


if it has joint density 
1 few? 4 @)O-w) | Oy)? ) 
= 2 + 3.24 
Ixy y) 2no.0y J 1-p? exp 2(1-p2y \ o p OxOy oF ) ( ) 
with -00 <x,y<-+too. By (3.11), the corresponding marginal densities are seen to be 
= 2 
Ix) = | exp =H" | —O<x< +0, 
J 21 Ox 20% 
( (-p,)?) 
fy(x) = i exp[- SE |, -o<y<+to. 
42m Oy 2oy 


Hence, if (X, Y) has a bivariate normal distribution with parameters [1x, Ox, Uy, Oy, 


and p, then the random variables X and Y have each a normal distribution with res- 
pective parameters 1x, Ox and ly, oy. Since the independence of X and Y is equiv- 
alent to 


Sx, VOY) = fx fr), 


X and Y are independent if and only if p =0. (In the next section it will be shown 
that the parameter p is the correlation coefficient between X and Y, a measure of the 
degree of linear statistical dependency between any two random variables.) 


The conditional density of Y given X= x is obtained from fy y(x, y) and (3.19): 


fr(vix) = ! (»-p #-1.)-ny) | (3.25) 


Sait as - ess 
(2m oy J1-p? 205(1-p?) 
Hence, given X =x, the random variable Y has a normal distribution with parameters 

BX =x) =p 22 (e—x)tmy and Var(YX=x)=03(1-p2). 3.26) 
Thus, the regression function 


my(x) = E(Y|X =x) 


of Y with regard to X =x for the bivariate normal distribution is a straight line. 


Example 3.7 The daily consumptions of tap water X and Y of two neighboring towns 
have a joint normal distribution with parameters 


Hy = Hy = 16[103 m3], ox =o) =2[103m3], and p = 0.5. 
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The conditional probability density of Yon condition X= x has parameters 


E(Y |x) =p 2 (xp) +My =0.5- 2 (@- 16) = 248 


Var(Y\x) = 03(1 — p2) =4(1 -0.52) =3. 
Hence, 
peers (8)? 
fow)= b= o|-4() —w<y<+o, 


This is the density of an N(8 + x/2, 3)-distributed random variable. Some conditional 
interval probabilities are: 


PU <¥< 16lX=10)=0[ 43) o( 423) =o988 0.718 = 0.240, 


pB 
< = = o( 1628) — of 4218) - 0.718 -0.282 = 436. 
P(14< Y<16|X= 14) Us 7 0.718 — 0.282 = 0.436 


The corresponding unconditional probability is 


PU4< Y< 16) = 0( 1818) o( 418) 0.500 0.159 = 0.341. oO 


3.1.2.4 Bivariate Exponential Distributions 


In this section some joint probability distributions of random vectors (X, Y) with non- 
negative X and Y are considerered, whose marginal distributions are one-dimensional 
exponential distributions. 


a) A random vector (X, Y) has a Marshall-Olkin distribution if its joint distribution 
function Fy y(x,y) = P(X <x, Y<y) is for x,y 2 0 given by 


Fyy(,y)=1- eo Aitayx _ o-Arthy 4. phi x-Agy-A max(x,y) (3.27) 
with positive parameters 4;, 12, and a nonnegative parameter A. By property (3) at 
page 121, the corresponding marginal distribution functions are 

Fy(x)=1-e Crt, Fyy)=1-e O29; x,y >0. 

Using property (7) at page 121 gives the corresponding joint survival function 

Fy y(x,y) = P(X>x,Y>y) = eA Ag Amaxey), x,y > 0. 


The joint density of (X, Y) is 
Ao(A1 HAVE AD -CitAe if x > y, 
Ixy y) = -Ayx-(Ao-tA)y : 
AyAg+Aje “I M2 if x<y. 
This distribution has the following physical background: A system, which starts oper- 
ating at time point t= 0, consists of two subsystems S$; and Sz. They are subject to 
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three types of shocks: A shock of type i occurs at time 7; and immediately destroys 
subsystem S;, i= 1,2. A shock of type 3 occurs at time T and immediately destroys 
both subsystems. The subsystems cannot fail for other reasons. The arrival times of 
the shocks T,, T2, and T are asssumed to be independent, exponentially with para- 
meters A1;,A7, and A distributed random variables. Hence, the respective lifetimes XY 
and Y of the subsystems S$; and S> are 

X=min(7|,7) and Y=min(79, 7). 


Thus, the lifetimes of the subsystems are clearly dependent, and their joint survival 
probability is given by Fy y(x,y). 


b) A random vector (X, Y) has a Gumbel distribution with positive parameters 11, A 
and parameter A, 0 <A < 1, ifits joint distribution function is given by 


Fyy(x,y)=1-e Aix 4 eA2y — e-Mxhay-AXY x y > 0), (3.28) 
The corresponding marginal distribution functions are 
Fy(x)=1-e™', Fyy)=1-e 9, x,y20, 
so that the corresponding joint survival probability is 
Fy yy) =P(X>x,Y>y) =e Aa Asy, x,y >0, 


c) Another useful bivariate distribution of a random vector (X, Y) with exponential 
marginal distributions is given for x => 0 and y = 0 by the joint distribution function 


Fyy(x,y)=P(XS x, Y¥<y)=1-e hi — eV —fetir 4 eth2y — 1], 


1,42 >0. The corresponding marginal distribution functions are the same as the 
ones of the Gumbel distribution. Again by property (7) at page 121, the joint survival 
probability is 


Fyy(x,y) = P(X> x, Y>y)= [et +et27-1]-!; 01,02 >0, x,y 20. 


3.1.3 Linear Regression and Correlation Analysis 


For a given random vector (X, Y) the aim of this section is to approximate Y by a lin- 
ear function Y of X: 


Y=aXt+b. (3.29) 

Such an approximation can be expected to yield good results if the regression funct- 
ion my(x) of Y with regard to x is at least approximately a straight line: 

my(x) = E(Y|X =x) = ax + B. (3.30) 


Whether this assumption is realistic in a practical situation, one can empirically check 
by a scatter diagram of a sample: Let, for instance, X be the speed of a car and Y the 
corresponding braking time to a full stop. n measurements of both speed and corres- 
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Figure 3.6 Scatter diagram for a linear regression function 


ponding braking time had been done. The result is a sample of structure 


{(x;,¥;), i= 1,2,...,2}. 
If the scatter diagram of this sample looks principally like the one in Figure 3.6, then 
assumption (3.29) is justified. 


As criterion for the optimum fit of Y to Y serves the mean squared deviation: 
O(a, b) = E[(Y-Y]? = E[Y-(aX+b)]?. (3.31) 


The parameters a and b have to be determined such that O(a, b) assumes its absolute 
minimum. The necessary conditions are 


C0(a,b) 4 20a,b) _ 
Ca” Ob 
By multiplying out the brackets in (3.31), O(a, b) is seen to be 


0. (3.32) 


O(a, b) = E(Y?) — 2a E(XY) — 2b E(Y) + a2 E(X2) + 2ab E(X) +b? (3.33) 


so that the necessary conditions (3.32) become 


COED) _ 5 B(XY) + 2aE(X2) + 2b EV) =O. 
oe7) = ~2 B(Y) + 2a E(X) + 2b. 


The unique solution (a, b) = (a, B) is 
= LUN - EWEN) 
Var(X) ° 
B= E(Y)-aE(X). (3.35) 


(3.34) 


2 2 2 
Since PNGs2) 2E(X7), tea =2, and OED) « 2E(X), 


6a2 Oa 0b 
the sufficient condition for an absolute minimum at (a, b) = (a, B) is fulfilled: 


&Q(a,b) 6? O(a,b) [Fa b) 


2 
aa =4 (Ev?) - [EXO1) =4Var(X) > 0. 
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With oy = Var(X) and oF = Var(Y), the smallest possible mean square deviation of 
Y from Y is obtained from (3.33) by substituting there a and b with a and B: 

O(a, B) = (Gy-a0x). (3.36) 
O(a, B) is the residual variance. The smaller O(a, B), the better is the fit of YtoY. 


Definition 3.3 The straight line 

y=ax+B 
is called regression line. The parameters o and B are the regression coefficients. @ 
Best Estimate If the regression function m y(x) is not linear, then the 'random regres- 


sion line’ Ya, B)=aX+f6 is not the best estimate for Y with regard to the mean 
squared deviation. Without proof, the following key result is given: 


The best estimate for Y is m y(X) = E(1|X), i.e. for all real-valued functions g(x), 
E(Y-E(Y|X)) < E(Y—g9(X))?. 


Only if the regression function m y(x) = E(Y|x) is linear, Yo, B) =aX+B8 is the best 
estimate for Y with regard to the mean-squared deviation. In view of (3.26), this 
proves an important property of the bivariate normal distribution: 


If (X,Y) has a bivariate normal distribution, then the regression line 
Y(a,B) =aX+p 


is the best possible estimation for Y with respect to the mean-squared deviation. 


Covariance The covariance between two random variables X and Y is defined as 


Cov(X, Y) = E([X— E(X)] - [Y— EQ))). (3.37) 
By multiplying out the brackets, one obtains an equivalent formula for the covariance: 
Cov(X, Y) = E(XY) - E(X) - E(Y). (3.38) 

The covariance has properties 
Cov(X, X) = Var(X), (3.39) 
and Cow(X + Y, Z) = Cov(X, Z) + Cov(Y, Z). (3.40) 


From (3.14) and (3.38): 
| If two random variables are independent, then their covariance is 0. 
For this reason, the covariance serves as a measure for the degree of statistical depen- 


dence between two random variables. Generally one can expect that with increasing 
absolute value |Cov(X, Y)| the degree of statistical dependence is increasing. But there 
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are examples (given later) which prove that Cov(X, Y) = 0 not necessarily implies the 
independence of X and Y. 


In view of being a measure for the dependence of two random variables, it is not sur- 
prising that the covariance between _X and Y is a factor of a (see (3.34)). If X and Y 
are independent, then Cov(X, Y) = 0. In this case the regression line has slope a = 0, 

ie., it is a parallel to the x-axis, which gives no indication of a possible dependency 
between X and Y. 


Unfortunately, the covariance does not allow to compare the degree of dependency 
between two different pairs of random variables, since it principally can assume any 
real value from —co to +00. 


Example 3.8 The random vector (X, Y) has the joint density 
IxyQy) = Sxy, O<sx<yS2. 
The marginal distributions are known from example 3.3: 


Ix@) = F(4x-x3), OS x52; fy) =Fy7, OSy<2. 


X and Y are defined in such a way that they cannot be independent. The correspond- 
ing mean values and variances are 


E(X) = 16/15, Var(X) = 132/675, 
E(Y)=8/5,  Var(Y)=8/75. 
By (3.13), 
Bum =[2 Pay dsvdde=4 fs2( [220 


2 
= £5 x2(8 - x3 )de = 16/19. 


With these parameters, the regression coefficients can be calculated: 


16_ 16,8 
9 15 5 | 
a= 235 = 0.36364, 
675 
B=$-o. B=1.21212, 


which gives the regression line 

Y =0.36364x + 1.21212. 
Thus, an increase of X by one unit approximately implies on average an increase of Y 
by 0.36364 units. The covariance between X and Y is 0.07111. 


In view of the restriction for the joint density to the region 0<x<y<2, one would 
expect that the regression line assumes at value x = 2 the value 2 as well. But this is 
not the case since (2) = 1.93. This is because the regression function m y(x) is not a 
straight line so that the regression line is only an approximation to m y(x). The exact 
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| | | | 
0 04 08 12 16 2 il 
Figure 3.7 Regression function and regression line for example 3.8 


average relationship between_X and Y is given by the regression function: 


+ fxy0y) 
XYUUGY 
= E(Y|X=x) = : d 
m (x) = YX =x) = [yo dp 
2 ly 2 
= Jy i 2 5 = aa fy? dy 
x (Ax -Xx ) 4-x* x 
2 8-x 
= 3 i 42 > O<x<2 
Figure 3.7 shows that the largest differences between the regression function and the 
regression line are at the left- and at the right-hand side of the x-interval [0, 2]. Oo 
g g 


Correlation Coefficient The correlation coefficient p = p(X, Y) between two random 
variables X and Y with standard deviations oy and oy is defined as the ratio 
EU(X— E(X))-(V-EQ))) _ E(XY) — E(X) - EY) 

Oy Oy - Oy Oy 


p(X, Y) = (3.41) 


The random variables X and Y are uncorrelated if p(X, Y) = 0, they are positively cor- 
related if p(X, Y) > 0, and negatively correlated if p(X, Y) < 0. 


The correlation coefficient can be written as the mean value of the product of the 
standardizations of X and Y: 


7 EO) | (Cee) 
pu =e|(SFOD) (CoA) (3.42) 
There is the following relationship to the covariance between X and Y: 
Cov(X, 
p(X, ¥) = GAD 3.43) 


OyxOy ~ 


Hence, X and Y are uncorrelated if and only if Cov(X, Y) = 0. If X and Y are indepen- 
dent, then XY and Y are uncorrelated. But the converse need not be true (see examples 
3.11 and 3.12). 


138 APPLIED PROBABILITY AND STOCHASTIC PROCESSES 


The Marshall-Olkin distribution and the Gumbel distribution (pages 132 and 133) are 
examples for the equivalence of X and Y being independent and uncorrelated: 


If (X,Y) has the Marshall-Olkin distribution (3.27), then the correlation coefficient 
between X and Y is (exercise 3.18) 
bas x 
aay ea Cee 


p(X, Y) = 0 if and only if A = 0. X and Y are independent if and only if A = 0. 


If (X, Y) has the Gumbel distribution (3.28) with 4; =’ = 1, then the correlation co- 
efficient between X and Y is (without proof) 
Spe 
PED Tay 


dy-l. 


IfA =0, then p(X, Y) =0 and X and Y are independent, and, vice versa, if X and Y 
are independent or p(X, Y) = 0, then A = 0. 


With the correlation coefficient, the regression coefficients @ and B can be written as 

(compare to (3.26)) 

a", B= E(Y)-poh EX) (3.44) 
aX OY >’ 


A=P>Gy 


and another representation of the regression line is 


FEW) _x- EW) 


Oy OV 

Therefore, when X and Y are positively (negatively) correlated, then an increase (dec- 
rease) in X will on average lead to an increase (decrease) in Y. If X and Y are uncor- 
related, the regression line does not depend on x at all. Nevertheless, even in this case 
there may be a dependency between X and Y, since X can have influence on the vari- 
ability of Y. Figure 3.8 illustrates this situation: If p =0, the regression line is a 
parallel to the x-axis, namely ) =E(Y). With increasing x the fluctuations of the 
realizations of Y become larger and larger, but in such a way that E(Y) remains 
constant. 


+ + 
0 ox 


Figure 3.8 Scatter diagram for (X, Y) indicating a dependence 
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Theorem 3.2 The correlation coefficient p(X, Y) has the following properties: 
(1) If X and Y are independent, then p(X, Y) = 0. 

(2) If X and Y are linearly dependent, then p(X, Y) = +1. 

(3) For any random variables X and Y: -1 < p(X, Y)<+1. 


Proof (1) The assertion follows from Cov(X, Y) =oyoy p(X, Y) and (3.38). 
(2) Let Y=aX+5 for any a and b. Then, from (2.54) and (2.61), 

E(Y) =aE(X)+b, 05 =a2Var(X). 
Now, from (3.42), 


(X-B))) (a(X-BEx) | _ ,f a@-£U0)? | 

p= e|( Oy } lalox }=4 laloy J 
oid Oe sae. +1 if a>0 
~ Tal oe lal |-1 if a<0 


(3) Using (3.43), the residual variance (3.36) can be written in the form 
v) 
O(a, B) = oy (1p). 
Since a quadratic deviation can never be negative and oY is positive anyway, the fac- 
tor 1—p? must be positive. But 1 — p? > 0 is equivalent to-1 <p < +41. a 


According to this theorem, the correlation coefficient can be interpreted as the covar- 
iance standardized to the interval [—1,+1]. In case of independence the correlation 
coefficient is 0; for linear (deterministic) dependence this coefficient assumes one of 
its extreme values -1 or +1. Thus, unlike the covariance, the correlation coefficient 
allows for comparing the (linear) dependencies between different pairs of random 
variables. However, the following examples show that even in case of (nonlinear) 
functional dependence the correlation coefficient can be so close to | that the differ- 
ence is negligibly small, whereas, on the other hand, the correlation coefficient can 
be 0 for non-linear functional dependence. 


Example 3.9 The bending strength Y of a steel rod of a given length is given by the 
equation Y= cX *, where _X is the diameter of the rod and the parameter c is a mate- 
rial constant. X is a random variable, which has a uniform distribution in the interval 
[3.92 cm, 4.08 cm]. The input parameters for p(X, Y) are 


E(X) =4, 


Var(X) = she J py x2dx — 16 = she [x3 ]3 35 — 16 = 0.0021333, 


mane 0.48 3.92 


_ ¢ 4.08 97. 
E(Y) = s [3.95 x7dx = 16.0021333 -c, 
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Var ¥) = 2 [498 x4dx—[e BY)? = 0.1365380 -c?, 


and 
0 


ee ae 
E(XY) = 0.16 ie 
Hence, the correlation coefficient between_X and Y is 


_ 64.0256 -c—4 - 16.0021333 -c 
ROG 0.0461877 -0.3695105 -¢ 


§ x3dx = 64.0256000 -c. 


= 0.9999976. 


Although there is no linear functional relationship between X and Y, their correlation 
coefficient is practically 1. (The extreme degree of numerical accuracy is required to 
make sure that the calculated correlation coefficient does not exceed 1.) Oo 


Example 3.10 Let Y= sin_X, where X has a uniform distribution in the interval [0, 7], 
Le., it has density fy(x) = 1/n, 0<x <1. The input parameters for Cov(X, Y) are 
E(X) = 7/2, 
E(Y) = Jo sinx dx = f [-cosx]9 = 2/r. 


E(XY) = = Joxsinx dx = z [sinx—xcosx]j = 1. 


Hence, Cov(X, Y) =0 so that p(X, Y) =0 as well. Despite X and Y being functionally 
related, they are uncorrelated. (Give an intuitive explanation for this.) O 


As mentioned before in section 3.1.2.3, if the random vector (X, Y) has a bivariate 
normal distribution, then the random variables X and Y are independent if and only if 
they are uncorrelated. There are bivariate distributions, which do not have this prop- 
erty, i.e., dependent random variables can be uncorrelated. This will be demonstrated 
by the following two examples. 


Example 3.11 The random vector (X, Y) has the joint probability density 


2 2 2 2 
+ + 
fy,y) =~ = xp (4 * )} 0 < x,y < +00, 


Next the marginal densities of fy y(x,) have to be determined: 


+00 2 2 2 2 
po Too (222) 


—oO 


ex? af 1 yng 7 21 yy, 

=——|x e y+ }y e ly |. 
2/20 0 27 -o 27 

The integrand of the first integral is the density of an M(0, 1)- distribution; the second 

integral is the variance of an M(0, 1)-random variable. Both integrals are equal to | 

so that 
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1 


2 
(x2 4+1)e*?, -wo<x,y<+to., 
2/20 


Px(x) = 
Since fy y(x,y) is symmetric in x and y, 


fy”) = a tev, —w<xy<-+to. 
Tl 


Tie 


Obviously, fx y(x,y) #fx(x) -fy(v) so that X and Y are not independent. 


The mean value of XY is 


141 


Both integrals in the last line are 0, since their integrands are odd functions with re- 
gard to the origin. But E(X) and E(Y) are 0 as well, since fy(x) and fy(y) are sym- 


metric functions with regard to the origin. Hence, E(XY) = E(X) - E(Y). Thus, X and Y 
O 


are uncorrelated, but not independent. 


Regression line and correlation coefficient are defined for discrete random variables 


as well. The next example gives a discrete analogue to the previous one. 


Example 3.12 Let X and Y be two discrete random variables with ranges 
Ry = {-2,-1,+1,+2} and Ry = {-1,0,+1}. 


Their joint distribution is given by Table 3.2: 


Table 3.2 Joint and marginal distribution for Example 3.12 


From Table 3.2: The input parameters into the covariance between X and Y are 


E(X) = [3+ (-2)+5-(C1)+5-G41) +3 -42)] =0, 


E(Y) = 4 [6-(-1) +4-0+6-(+1)]=0, 
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E(XY) = 4 [(-2)(-1) +2 (-DCD +2- GDC + 42)CD] 
+ 1 [(-2)-04+2-(-1)-04+2-(+1)-0+(42)-0] 
+ 77 ((-2)41)+2-(-DGD+2-GDG1)+G2)¢1)] = 0. 


Hence, Cov(X, Y) = p(X, Y) =0 so that X and Y are uncorrelated. 


On the other hand, 
1 3. 6 9 
PAs2, Yel 7 ehh 2ePY= ae ae oe 
so that XY and Y are not independent. O 


In applications it is usually assumed that the random vector (X, Y) has a bivariate nor- 
mal distribution. Reasons for this are the following ones 


1) The regression line } = ax+f coincides with the regression function 
m y(x) = E(Y|X =x). 


Hence, Y = aX+f is the best estimate for Y with regard to the mean squared deviation 
of Y from Y. 


2) X and Y are independent if and only if X and Y are uncorrelated. 
3) Applicability of statistical procedures. 


Statistical Approach to Linear Regression The approach to the linear regression 
analysis adopted so far in this section is based on assuming that the joint distribution 
of the random vector (X, Y) is known, including the numerical parameters involved. 
The statistical approach is to estimate the numerical parameters based on a sample 
{(x;,¥;)} i= 1,2,...,2}. This sample is obtained by repeating the random experiment 
with outcome (X, Y) independently and under identical conditions n times and register- 
ing the realizations (x;,y;). The principle of minimizing the mean squared deviation 
(3.31) is now applied to minimizing the arithmetic mean of the squared deviations of 
the observed values y; from the ones given by the regression line ¥ = ax+, whose 
coefficients a and B are to be estimated: 
1s sly 2 

Oa, B= BOi-Fd) = n 2 Wi- ox;—B) > min. (3.45) 
This method of parameter estimation is called the method of least squares. Differen- 
tiating (3.45) with respect to a and B yields necessary and in this case also sufficient 
conditions for the best least square estimates of a and B (of course, the factor 1/n can 
be ignored): 


n n 2 2 
Lxjyyj-0 Lx; —nXp+anx* =0, 
i=l i=l 


B=y-ax. 
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The unique solution is 


s XjYj-NXVY ¥ 6 —X)(vi- Y) 


= = 5 (3.46) 
D x7 -nx? D(x; -¥)? 


B=p-ax, 
where x and ¥ are the arithmetic means 


—~ Lun - lyn 
X=_ leit YH xq Dir Vi- 


a and 6 are (point) estimates of the unknown regression coefficients a and B. With 
the additional notation 


ae = Pict p 
= Dei - 3), ae aay zl Oi’, 


1 2 = 1 oe 
sxy=— > LG X)\V; p= (Shi miyi- np), 


the empirical regression coefficients & and B can be rewritten as 


A SYY Sy a as Sy _ 
eT ae? B=y reset (3.47) 


where s yy, the empirical or sample covariance, is an estimate for the (theoretical) 
covariance Cov(X, Y) between X and Y, and 


pHi D= =. (3.48) 


the empirical or sample correlation coefficient, is an estimate for the (theoretical) 
correlation coefficient p = p(X, Y) between_X and Y. With this notation and interpre- 
tation the analogies between (3.43) and (3.47) as well as (3.41) and (3.48) are obvious. 
It is interesting that the same estimates of the regression coefficients would have been 
obtained if all mean values in (3.34) are replaced with the corresponding arithmetic 
means. (Note that variances are mean values as well.) The fact that in i st, and syy 
the factor 1/(n— 1) appears instead of 1/n is motivated by theorem 4.2 (page 188). 


Example 3.13 In a virgin forest stand of yellowwoods (Podocarpus latifolius) in the 
Soutpansberg, South Africa, 12 trees had been randomly selected and had their stem 
diameters (1.3m above ground) and heights measured. Table 3.3 shows the results: 


Tree number 1 2 3 4 5 6 7 8 9 10 | 11 | 12 
Stem diameter [cm] x, | 44 | 62 | 50 | 84 | 38 | 95 | 76 | 104) 35 | 99 | 57 | 78 
Height [m] y; | 32 | 48 | 38 | 56 | 31 | 62 | 57 | 73 | 28 | 76 | 41 | 49 


Table 3.3 Stem diameters and the corresponding tree heights 
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Figure 3.9 Scatter diagram for example 3.13 


Then, 
X= 68.50, y=49.25, sx =24.21, sy=16.03, syy=378.14. 


This gives the empirical correlation coefficient as 


Sxy__ __ 378.14 
-sy 24.21 -16.03 


s 
xX 
Hence, there is a strong linear connection between stem diameter and tree height. This 
numerical result is in concordance with Figure 3.9. The empirical regression line, 


therefore, adequately quantifies the average relationship between stem diameter and 
tree height: 


= 0.974. 


J =Gx+P =0.645x+ 5.068. 


Hence, the average increase of the height of a yellowwood is 0.645 m if the stem 
diameter increases by lcm. Oo 


3.2 n-DIMENSIONAL RANDOM VARIABLES 


Let X,, X>,...,Xn, n> 2, be continuous random variables with distribution functions 


Fy, 1), Fx, (*2),°+) Fx, @n) (3.49) 
and probabiliy densities 
fx 1), fx (*2)5°**s Sx, On). (3.50) 
In what follows, let 
X=(X1, Xo,...,Xn). 
The joint distribution function of the random vector X is defined as 
FX(X1,X2,..5Xn) = P(X, Sx], X20 <X2,...,Xn S Xn). (3.51) 


The marginal distribution functions F'y,(x;) are obtained from F’x(x1,x2,...,Xn) by 


Fy, (Xj) = F(©, «1.5, Xj, 0, «.., 00); i= 1,2, ...57. 
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Basic properties of the joint distribution function are: 

1) Fx(x%1,X2,°°+,Xn) =0 if one or more of the x; are equal to -<0. 
2) Fx (+00, +00, ---, +00) = 1, 

3) Fx(*1,X2,.-.,Xn) is nondecreasing in each x1,X9,...,Xn. 


Apart from the marginal distribution functions, F’yx(x1,X2,...,Xn) yields the joint dis- 
tributions of all subvectors of X. Let, for instance, 
{Xj,X7} C {X1, Xo,..,Xn}s i<j, n>2. 
Then the joint distribution function Fy, y,(x;,x;) of the random vector (X;,%;) 1s 
Fx, x, isXj) = Fx (, + +00, Xj, 0, +++, 00,254 1,00, +++, 0). 
In this way, the joint distribution functions of all subvectors 
1XG, Migs eae AA oe in as AH; 
can be obtained. For instance, the joint distribution function of (X1,X9,...,X;) is 
PY Re AM aX ey Xf) = PX (X15 X25 Xfp, 0,0, ...,0), Kn. 


The joint probability density of X is the nth mixed partial derivative of the joint dis- 
tribution function with respect to the x1, x2, ...,.x7: 


O"Fx(x1,%2, ote Xn) 
Ox 1 0x2°**OXn 


Fx(%1,%2,.--,Xn) = (3.52) 


The characteristic properties of the two-dimensional densities can be extended in a 
straightforward way to the n-dimensional densities. In particular, properties (3.11) 
are special cases of 


Sx1,%2, won) 2 0, 


and the marginal densities are for all i = 1, 2,...,n, 


Fe lxi) =f PE Px Orixe. an) dey dey drip den. (3.54) 


+00 ie = FX 15X25 -+-5Xn) dx dx2-- -dXn = 1, (3.53) 


—oO 


Definition 3.4 (independence) The random variables X1,X,...,Xn are (completely) 
independent if and only if 


FX(X1,%2,-5%n) = Fy, (41) > Fy, (2) + +++ + Fy, (0n). e 


For the practical relevance of this definition, see comment after formula (3.14), page 
122. In terms of the densities, the X1,X,...,Xn are independent if and only if 


FX (E1,¥25 05%) = fx, 1)» fix 2) - 00+ Lx, On). (3.55) 


Definition 3.4 also includes discrete random variables. However, for discrete random 
variables X; (complete) independence can be equivalently defined by 
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P(X] =%1,X2 =X2,°+5Xn =Xn) = P(X] = x1) P(Xp =x2)-++++P(Xn =n) (3.56) 
for all x; ¢ Ry,; i=1,2,...,n. 
The intuitive meaning of independence is that the values, which any of the X; have 
assumed, has no influence on the values, which the remaining X; have taken on. 
If the X; are independent, the set of the marginal distributions 

(Fy, (x1), Fy,(x2), see9 Fy, (xn)} 

contains the same amount of information on the probability distribution of the random 
vector X as the joint probability distribution function. 
If the X|,X>,....Xn are independent, then every subset (X;,,X7j,,....X;,} of the set 
{X1,X2,...,Xn} is independent as well. In particular, all possible pairs of random var- 
iables (X;,X;), i#j, are independent (pairwise independence of the X1,X9,...,Xn). 
As the following example shows, pairwise independence of the X1,X2,...,Xn does 
not necessarily imply their complete independence. 


Example 3.14 Let 4,4, and A3 be pairwise independent random events and X1,X2, 
and X3 their respective indicator variables: 


1 if A; occurs, 


X= i=1,2,3. 


0 otherwise, 
By (3.56), complete independence of the X;,X, and X3 would imply that 
P(X) = 1, X2 = 1,X3 = 1) = P(X) = x1) P(XQ = x2): P(X3 = 3), 
or equivalently that 
P(A, NA 043) = P(A1)- P(A2) - P(A3). 


However, we know from example 1.20 that the pairwise independence of random 
events 4;,A, and A3 does necessarily imply their complete independence. oO 


The joint density of (X;,X;), i<j, is 
+00 +400 
fx xj 00%) = 2 PP Ota een dey de dei Oey pdeyy den, 
whereas the joint density of (X1,X9,...,X;), k<n, is 


+00 +00 
IH, No yoX 1%, XQ) = JP [PE Px Or x2, Leet ee Xn) Oey dn. 


Conditional densities can be obtained analogously to the two-dimensional case: For 
instance, the conditional density of (X|,X>,...,.Xn) given X; =x; 1s 


Fx1,%2, wy Xn) 
Di Wie cite ee AC ape re ers Pree ee fee (3.57) 


and the conditional density of (X1,X2,...,Xn) given (Xj =x, X2 =X2,...,.X =Xx,) is 
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FxX(X1,%2,-+-5Xn) 658) 


ibe 5G; Cie hiss Fa 84, deep) = 
REDE Mane gee a. PH, MEO A Foci hp) 


fork<n. Let y=h(x1,X2,...,Xn) be a real-valued function of n variables. Then the 
mean value of the random variable Y=h(X1,X2,...,Xn) 1s defined as 
E(Y) = Je FP hex 2, san) fx O12, Xn) dxydxy- din. (3.59) 
In particular, the mean value of the product Y= X 1 X2---Xn is 
E(X, Xo: -Xn) = (ee vee pee XX" Xn FR (X1,X2,--,Xn) dx dx2°+-dxn. 


Due to (3.55), for independent X; this n-dimensional integral simplifies to the prod- 
uct of n one-dimensional integrals: 


E(X X2°--Xn) = E(X1) E(X2) + E(Xn). (3.60) 


The mean value of the product of independent random variables is equal to the 
product of the mean values of these random variables. 


The conditional mean value of Y=h(X,...,Xn) on condition X =x1,....X; =X, is 


E(Y|x1,%25°+ XK) = (3.61) 
+00 +00 +00 
X15 XQ5 00g X 
= J i) a i) h(x 1,X2, fer meee 2 GUE) ee Oe en me 
=O’ 00 IX,,X) seee5 X,(X 15X25 XK) 


Replacing in (3.61) the realizations x1,x,...,x,; with the corresponding random vari- 
ables X1,X,...,.X, yields the random mean value of Y on condition X1,X9,...,X,: 


BOX, X95 Xe) = [OS -- [  Xay- Xe Xea Xn) (3.62) 
Wek (X1,X9,°° Xk Xk 1s Xn) spade paydtn), 


The mean value of this random variable (with respect to all X1,Xo,...,X,) is 
E ¥X),..X( BOX, X2, 4X 4)) = EY). (3.63) 
For instance, the mean value of E(Y|X1,X, ...,.X;) with respect to the random varia- 
bles X1,Xo,...,X;_1 is the random variable: 
EX Xo yoo Xp- LOX 1X2, XE) = EY X4)- (3.64) 
Now it is obvious how to obtain the conditional mean values E(Y! emis *++,X;,) and 


E(V1X;,,X; 


iy>'*'>X;,) with regard to any subsets of 


{X1,X2,-.Xn} and {X1,Xo,...,Xn}, 


respectively. 
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Let c= Cov (Xj, Xj) be the covariance between X; and X, pLs= 1,2,...,2. It is use- 


ful to unite the c;; in the covariance matrix C: 
C= ((c;;)) 4 i,j= 1,2, vel 
The main diagonal of C consists of the variances of the X;: 


cj; = Var(X;); t= 1,2,...,n 


n-Dimensional Normal Distribution Let X = (X),X2,---,Xn) be an n-dimensional 
random vector with 1; = E(X;) for i=1,2,...,n, and covariance matrix C = ((c;;)). 


Furthermore, let |C| and C7! be the positive determinant and the inverse of C, res- 
pectively, as well as 


w= (M1, H2,°°*, Hn), and x a (X1,X2,°°)Xn). 


(X1,X0,°°:;Xn) has an n-dimensionally normal (or Gaussian) distribution if it has 
joint density 


1 = 
fx(0) = Few x73 -weeMa-w"), (3.65) 


where (x — p)” is the transpose of the vector 


X—P=(%] — 1, X2 —H2,°+)Xn — Mn). 
By doing the matrix-vector-multiplication in (3.65), fx(x) becomes 


a: 5 
I x(x) = On) "ICI exp (4 2IC] ze Ci @i- LX; -n), (3.66) 


where Cj; is the cofactor of c;;. 


Forn=2, x; =x, and x2 =y, (3.66) specializes to the density of the bivariate normal 
distribution (3.24). Generalizing from the bivariate special case, it can be shown that 
the random variables X; have an M(w;,07)-distribution with o? =c;;, 1=1,2,...,n, 
if X has an n-dimensional normal distribution, i.e., the marginal distributions of X 
are the one-dimensional normal distributions 
Nui 5); i= 1,2, seg Ml 

If the X; are uncorrelated, then C = ((c;;)) is a diagonal matrix with c;;=0 for i #/ 
so that the joint density fx(x1,x2,...,Xn) assumes the product form (3.55): 


fabewsaenen)=f| ea, exp( 5#)’)| : (3.67) 


Hence, the X1,X,...,Xn are independent if and only if they are uncorrelated. 
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Theorem 3.3 The random vector (X1,X2,...,Xn) have an n-dimensionally normal 
distribution. If the random variables Y, Y>,..., Ym are linear combinations of the X;, 
i.e., if there exist constants a;; so that 


Weal gk? 4S loam: 
then (Y,, Y2,..., Ym) has an m-dimensional normal distribution (without proof). 


The following two n-dimensional distributions are generalizations of the bivariate 
distributions (3.27) and (3.28), respectively. 


n-Dimensional Marshall-Olkin Distribution The random vector X = (X1,X9,...,Xn) 
has an n-dimensional Marshall-Olkin distribution with positive parameters 11, 49, ..., 
and A, and with nonnegative parameter A if it has the joint survival probability 


FY (X1,X 5 +05Xn) = P(X] > X1,X0 > X9,0,Xn > Xn) 


= eTAix1hgx2- An Xn Amax(x] X2,---Xn) x; 20, i=1,2,...,7. 


n-Dimensional Gumbel Distribution The random vector X = (X1,X9,...,Xn) has an 
n-dimensional Gumbel distribution with positive parameters 11, X2,...,An and with 
parameter A, 0 <1 < 1, if it has the joint survival probability 


Py (X 1,225 5X) = P(X] > x1,X2 > X95 -.,Xn > Xn) 


= ei haxa Anta heriea en) |g > 0, = 1,2, 0150. 


3.3 EXERCISES 


3.1) Two dice are thrown. Their respective random outcomes are X; and X. Let 
X = max(X1,X2) and Y be the number of even components of (X1,X2). X and Y have 
the respective ranges Ry = {1,2,3,4,5,6} and Ry = {0, 1,2}. 

(1) Determine the joint probability distribution of the random vector (X,Y) and the 
corresponding marginal distributions. Are _X and Y independent? 


(2) Determine E(X), E(Y), and E(XY). 


3.2) Every day a car dealer sells X cars of type 1 and Y cars of type 2. The following 
table shows the joint distribution {7;; = P(X =i, Y=); i,j =0,1,3} of (X, Y). 


y| 0 1 2 
0 0.1 0.1 0 
1 0.1 0.3 0.1 
2 0 02  O.1 
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(1) Determine the marginal distributions of (X, Y). 
(2) Are X and Y independent? 
(3) Determine the conditional mean values E(X|Y = i), i= 0, 1,2. 


3.3) Let B be the upper half of the circle x? +? = 1. The random vector (X, Y) is uni- 
formly distributed over B. 


(1) Determine the joint density of (X, Y). 
(2) Determine the marginal distribution densities. 
(3) Are X and Y independent? Is theorem 3.1 applicable to answer this question? 


3.4) Let the random vector (X, Y) have a uniform distribution over a circle with radius 
r=2. 


Determine the distribution function of the point (X, Y) from the center of this circle. 


3.5) Tessa and Vanessa have agreed to meet at a café between 16 and 17 o'clock. The 
arrival times of Tessa and Vanessa are X and Y, respectively. The random vector (X, Y) 
is assumed to have a uniform distribution over the square 


B={(Q,y); 16<x<17, 16<y<17}. 
Who comes first will wait for 40 minutes and then leave. 
What is the probability that Tessa and Vanessa will miss each other? 


3.6) Determine the mean length of a chord, which is randomly chosen in a circle with 
radius r. Consider separately the following ways how to randomly choose a chord: 


(1) For symmetry reasons, the direction of the chord can be fixed in advance. Draw 
the diameter of the circle, which is perpendicular to this direction. The midpoints of 
the chords are uniformly distributed over the whole length of the diameter. 


(2) For symmetry reasons, one end point of the chord can be fixed at the periphery of 
the circle. The direction of a chord is uniformly distributed over the interval in [0, 7]. 


(3) How do you explain the different results obtained under (1) an (2)? 


3.7) Matching bolts and nuts have the diameters X and Y, respectively. The random 
vector (X, Y) has a uniform distribution in a circle with radius 1mm and midpoint 
(30mm, 30mm). Determine the probabilities 


(1) P(Y>X), and (2) P(Y< X< 29). 


3.8) The random vector (X, Y) is defined as follows: X is uniformly distributed in the 
interval [0, 10]. On condition X = x, the random variable Y is uniformly distributed in 
the interval [0,x]. Determine 


(1) fxy@.y), fly), and fy(vly), 
(2) E(Y), E(X=5), (3) P(S< Y< 10). 
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3.9) Let 
fxv,y) = cx? y, OS xyS 1, 
be the joint probability density of the random vector (X, Y). 


(1) Determine the constant c and the marginal densities. 
(2) Are X and Y independent? 


3.10) The random vector (X, Y) has the joint probability density 
fx,y@y) = se™, O<x, O<y<2. 


(1) Determine the marginal densities and the mean values E(X) and E(Y). 
(2) Determine the conditional densities fy(x|y) and fy(v|x). Are X and Y independ- 


ent? 
3.11) Let 

fe,y) =F sine +y), O<xy<, 
be the joint probability density of the random vector (X, Y). 


(1) Determine the marginal densities. 
(2) Are X and Y independent? 
(3) Determine the conditional mean value E(Y|X = x). 


(4) Compare the numerical values E(Y|X=0) and E(Y|X= 71/2) to E(Y). Are the 
results in line with your anwer to (2)? 


3.12) The temperatures X and Y, measured daily at the same time at two different lo- 
cations, have the joint density 


( 3 \ 
fayny) = > exp] a t-F-j |» OSxySo. 


Determine the probabilities 
P(X> Y) and P(X < Y< 3X). 


3.13) A large population of rats had been fed with individually varying mixtures of 
wholegrain wheat and puffed wheat to see whether the composition of the food has 
any influence on the lifetimes of the rats. Let Y be the lifetime of a rat and X the cor- 
responding ratio of wholegrain it had in its food. An evaluation of (real life) data jus- 
tifies the assumption that the random vector (X, Y) has a bivariate normal distribution 
with parameters (in months) 


Lx = 0.50, of = 0.028, Hy =6.0, oF =3.61, and p=0.92. 


With these parameters, X and Y are unlikely to assume negative values. 
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(1) Determine the regression function m y(x), 0 <x <1, and the corresponding resid- 
ual variance. 


(2) Determine the probability P(Y = 8, X < 0.6). 
You may use software you are familiar with to numerically calculate this probability. Other- 
wise, only produce the double integral.) 


3.14) In a forest stand, the stem diameter X (measured 1.3 m above ground) and the 
corresponding tree height Y have a bivariate normal distribution with joint density 


YO) = 0785 18 G2 0.4 25 


Remark With this joint density, negative values of X and Y are extremely unlikely. 


Determine 
(1) the correlation coefficient p = p(X, Y), and 
(2) the regression line ¥ = ax+f. 
3.15) The prices per unit X and Y of two related stocks have a bivariate normal dis- 
tribution with parameters 

Ly = 24, 07 = 49, y= 36, of = 144, and p=0.8. 
(1) Determine the probabilities 

P(|Y—X|< 10) and P(|Y—X| > 15). 


You may make use of software you are familiar with to numerically calculate these probabil- 
ities. Otherwise only produce the respective double integrals. 


(2) Determine the regression function m y(x) and corresponding residual variance. 
3.16) (X, Y) has the joint distribution function F’y y(x,y). Show that 

Pax Xsb,c<Ysd)=[Fyy(b,4)- Fy y(b,c)]-[Fxy(a,4) -Fx(4,0)] 
for a<b and c< d. (This is formula (3.7), page 121.) For illustration, see the Figure: 


A 
(dja) (db) 


(a, c) (b, c) 


a b x 


The area integral of the joint probability density fy y(x,y) over the hatched area 
gives the desired probability. 
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3.17) Let a function of two variables x and y be given by 


0 for x+y <0, 


Bay) = 1 for x+y>0. 
Show that F(x,y) does not fulfill the condition 
[F(5, d) —F(b, c)] = [F(a,d) dG; c)] 20 


for all a,b,c, andd with a<b and c<d. Hence, although F(x,y) is continuous on 

the left in x and y and nondecreasing in x and y, it cannot be the joint distribution 

function of a random vector (X,Y). 

3.18) The vector (X, Y) has the joint distribution function F'y y(x,y). Show that 
P(X>x,Y>y)=1-Fy(y)-F x(x) +F xy, y). 


3.19) The random vector (X,Y) has the joint distribution function (Marshall-Olkin 
distribution, page 132) with parameters 1; > 0, A > 0, anda = 0 


Fyy(%,y)=1 e Ax e oy e A x-Aay A max(x,y) 


Show that the correlation coefficient between X and Y is given by 
_ x 
PE) Seren 
3.20) At time t= 0, a parallel system S consisting of two elements e; and e> starts 
operating. Their lifetimes X; and X2 are dependent with joint survival function 


1 


F = P(X >x1,X2 >x2)= 
(x1,%2) = P(X] > x1,X2 > x2) 701m 4 Hn] 


> X1,X2 20. 


(1) What are the distribution functions of X; and X? 
(2) What is the probability that the system survives the interval [0, 10]? 


Note By definition, a parallel system is fully operating at a time point ¢ if at least one of its 
elements is still operating at time ¢, i.e., a parallel system fails at that time point when the last 
of its operating elements fails. See also example 4.16, page 176. 


3.21) Prove the conditional variance formula 
Var(X) = E[Var(X|Y)] + Var[E(X|Y)]. 
Hint Make use of formulas (2.62) and (3.21). 
3.22) The random edge length X of a cube has a uniform distribution in the interval 


[4.8, 5.2]. Determine the correlation coefficient p = p(X, Y), where Y=X° is the 
volume of the cube. 
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3.23) The edge length X of a equilateral triangle is uniformly distributed in the inter- 
val [9.9, 10.1]. Determine the correlation coefficient between _X and the area Y of the 
triangle. 


3.24) The random vector (X, Y) has the joint density 
Ixy) =8xy, O<ySx<l. 
Determine 
(1) the correlation coefficient p(X, Y), 
(2) the regression line } = ax +P of Y with regard to_X, 
(3) the regression function y =m y(x). 
3.25) The random variables U and V are uncorrelated and have mean value 0. Their 


variances are 4 and 9, respectively. 


Determine the correlation coefficient p(X, Y) between the random variables 


X=2U+3V and Y= U-2Y. 


3.26) The random variable Z is uniformly distributed in the interval [0, 27]. 
Check whether the random variables X= sin Z and Y= cos Z are uncorrelated. 


CHAPTER 4 


Functions of Random Variables 


4.1 FUNCTIONS OF ONE RANDOM VARIABLE 


4.1.1 Probability Distribution 


Functions of a random variable have already played important roles in the previous 
two chapters. For instance, the nth moment of a random variable X is the mean value 
of the random variable Y=X", the variance of X is the mean value of the random 
variable Y = (X— E(X))*, a standard random variable S is defined by 

X-E(X) 


~ JPany” 

and the Laplace transform of the density of X is defined as the mean value of the ran- 
dom variable Y=e~**. In each case, a function y = A(x) is given, which assigns a 
value y to each realization x of X. Since it is random, which value X assumes, it is also 
random which value A(x) takes on. In this way, a new random variable is generated, 
which is denoted as Y= h(X). Hence, the focus is not in the first place on the values 
assumed by X, but on the values assumed by Y. The situation is quite analogous to 
the one which occurred when making the transition from the outcomes , @ € Q, of 
the underlying random experiment to the corresponding values of a random variable 
X = X(@) (section 2.1). Theoretically, one could straightly assign to every elementary 
event @ the value y= h(X(@)) instead of making a detour via _X, as the probability 
distribution of Y is fully determined by the one of X: 


P(Y € A)=P(X€h7!(A)), 


where h7! is the inverse function of 4. A motivation for making this detour is given 
by an example: The area of a circle with diameter D has to be determined. In view of 
a random measurement error A, the true diameter D is not known so that one has to 
work with an estimate for D, namely with the random variable X= D+ A. This gives 
instead of the true area of the circle A = A(D) = aD only a random estimate of A: 


Y=h(X) =4X?. 


The aim is to obtain from the probability distribution of X, assumed to be known, the 
desired probability distribution of Y. Another situation: A random signal X is emitted 
by a source (the useful signal), which arrives at the receiver as Y = sin _X. The receiver 
knows that this coding takes place, and he has information on the probability 
distribution of Y. Based on this knowledge, the receiver needs to extract information 
on the probability distribution of the useful signal. 
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a) Strictly increasing h(x) Let X be a continuous random variable with distribution 
function F'y(x) = P(X < x) and with range 


Ry=[a,b], -w<a<b<+, 
A(x) is assumed to be a differentiable and strictly increasing function on Ry. Hence, 


to every x there exists exactly one yg so that yg = A(xg) and vice versa. This implies 
the existence of the inverse function h(x), which will be denoted as 


x=x(y)=h7!(y). 
Its defining property is h~!(h(x)) =x for all x € Ry. The domain of definition of 17! 
is given by 
Ry={y, y=h(x), x € Rx}. 
Ry is also the range of the random variable Y = h(X). 
To derive the distribution function of Y note that the random event "A(X) < yo" occurs 
if and only if the random event "¥ < h7! (yo) = x0" occurs. Therefore, for all y € Ry, 
the distribution function of Y can be obtained from F'y : 
F yy) = P(YS y) = P(A) Sy) = PIX SAT'(y) = Fyx(h|(0),_-v € Ry. 
Using the chain rule, differentiation of F'y(y) with regard to y yields the probability 
density fy(y) of Y: 


ae “ 


sau = fulsty)) 


=fxh- lyy-4 
b) Strictly decreasing h(x) Under otherwise the same assumptions and notations as 
under a), let h(x) be a strictly decreasing function in Ry. In this case, the random 
event "/(X) < yo" occurs if and only if the random event "¥ > h7!(yo) =x9" occurs. 
Hence, for all y € Ry, 


Fy”) = POY Sy) = P(X) Sy) = P(X> A") = 1-F (A "(), y € Ry. 
Differentiation of F y(y) with regard to y yields the corresponding density: 


om d oe 


Fro) = = n'y) = fxs) & =fy(a (J. 
y ay 


Summarizing If y = h(x) is strictly increasing, the distribution function of Y= h(X) is 


Fy) =Fx(h!()), ye Ry. (4.1) 
If y = A(x) strictly decreasing, then 
Fy) =1-Fy(h7!(0), y € Ry. (4.2) 


In both cases, the probability density of Y=h(X) is 


pact, [4A710) 
Fr) =f) | =F 


(4.3) 


= fury) |& 
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In the important special case of a linear transformation h(x)=ax+b, the inverse 
function of h(x) is h7!(v) = (vy —b)/a so that the results (4.1) to (4.3) specialize to 


Fy)= Frye) for a>0, 


Fy(y)=1 - Fy) fora <0, (4.4) 


—b 
fro) = |E|H(22) for a#o. 
As pointed out before, in this case 
E(Y) =cE(X)+d, Var(Y) = a*Var(X). (4.5) 


Example 4.1 The distribution density of the random variable X is 
fx) = Uix?, x21. 
Integration yields the distribution function of the shifted Lomax distribution 


Fy(x)= 24, x21. 


Distribution function and density of the random variable Y= e~ has to be determin- 
ed. The function h(x) = e~ transforms the range Ry =[1,0) of X to the range 


Ry =(0, I/e] 


of Y=e™. Since h(x) is strictly decreasing and x(y) = h7!(y) =—Iny, equations (4.2) 
and (4.3) yield 


1 1 1 
Fy) S—— and: fy) = » O<ySe. O 
ny y(Iny)? ° 
Example 4.2 X has an exponential distribution with parameter A = 1: 
fx(x) =e™*, x20. 


The density of Y= 3 —X? has to be determined. Since y = h(x) = 3 —x3, the range of 
Y=A(X) is Ry = (—~,3). Moreover, 


x) =h1Q) = -y)"9 and S=-F3-2, ye Ry 


With these relations, equation (4.3) yields 
fry) = 2, -o<y<3. Oo 


Example 4.3 A body with mass m moves along a straight line with a random velocity 
X, which is uniformly distributed in the interval [0.8, 1.2]. What is the probability 
density of the body's kinetic energy Y= smX 2, and what is its mean kinetic energy? 


X has density 
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By the transformation h(x) = smx”, the range Ry = [0.8, 1.2] of X is transformed to 
the range Ry = [0.32 m, 0.72 m] of Y. Since 


= 2 
xehgy= 2, &= [, yer, 


and fy(x) is constant in Ry, equation (4.3) yields 
fry) = 22---, 0.32m<ys0.72m. 


i2m JP” 
The mean kinetic energy of the body with mass m is 


0.72m 


= 222s : 
E(Y) = | -froney eS » Ohare, 


Oe no OPE. 55 3/2 _ 3 
~ {2m Ee ae Sag (0.72) (0.32 m) | 


so that E(Y) = 0.506 m. O 


Figure 4.1 Nonmonotone h(x) 


Nonmonotone /(x) Equations analogous to (4.1) to (4.2) can also be established for 
nonmonotone functions h(x). 
As a special case, let us assume that y = A(x) assumes an absolute maximum at x = xo 
(Figure 4.1). More exactly, let 
foe h\(x) for x < x9, 

ho(x) for x>xQ, 


where f(x) and h(x) are strictly increasing and strictly decreasing, respectively, in 
their respective domains of definition. Then the random event "Y < y" with Y= A(X) 
can be written in the following form: 

"Y<y" = "hy (x) <y" U "ho(x) <y" 
(Figure 4.1). Hence, 
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Fy(v) = P(A(X) Sy) = P(hy (X) Sy) + Plha(X) Sy) 
= P(X< hj!) + P(X> hz! ()). 
Thus, F'y(y) can be represented as 
Fy) = Fh) +1-Fx(hz'(),-y € Ry. (4.6) 
Differentiating Fy(y) and letting xj =A7'(y) and x7 =h5'(y) yields the probability 


density of Y: 
d. 


all 
dy 
This representation of fy(y) is also valid if h(x) assumes at x =xQ an absolute mini- 
mum. 


Sy) = fx10)) » yeRy. (4.7) 


+ fxr) | o 


Example 4.4 A lawn sprinkler moves the direction of its nozzle from horizontal to 
perpendicular, 1.e., within the angular area from 0 to 2/2, with constant angular velo- 
city. Possible rotation movements of the nozzle do not play any role in what follows. 
It has to be checked, whether in this way the lawn, assumed to be a horizontal plane, 
is evenly irrigated, i.e., every part of the lawn receives on average the same amount 
of water per unit time. 


(x, z)-coordinates are introduced in that plane, in which the trajectory of a water drop 
is embedded. The nozzle is supposed to be in the origin (0,0) of this plane. It is known 
from physics that a drop of water, which leaves the nozzle at time 0 with velocity s 
and angle o to the lawn, is at time ¢ at location (air resistance being negelected) 


x=stcosa, z=stsina— See, 
where ¢ is such that z > 0, and g denotes the gravitational constant: 
g = 6.6726- 107! m3kg-!s-2, 
As soon as z becomes 0, the drop of water lands. This happens at time 
t, =25 sina. 


70° 


trajectory 


0 XL x 0 x 


Figure 4.2 Trajectory ofa waterdrop _‘ Figure 4.3 Trajectories of several drops 
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The corresponding x-coordinate is (Figure 4.2) 

xp=asin2a with a=s2/g, 
since sin2a = 2sinacosa. From this results the well-known fact that under the as- 
sumptions stated, a drop of water, just as any other particle, flies farthest if the start 
angle is 450 (Figure 4.3). Since the nozzle moves with constant angular velocity, the 


start angle of a drop of water leaving the nozzle at a random time point is a random 
variable & with density 


2 
fal@=q Osa oF (4.8) 
i.e. & is uniformly distributed in the interval [0, 2/2]. The lawn, under the irrigation 
policy adopted, will be evenly irrigated if and only if the random landing point 
X=asin2a 


with range Ry =[0,a] has a uniform distribution in the interval [0, 2/2] as well. This 
seems to be unlikely, and the probabilistic analysis will confirm this suspicion. 


h(a) fre) 


a 


hy(a) h(a) 


2n 

a 

0 0 
0 i a > 0 rd 
Figure 4.4 Graph of A(a) =asin2a Figure 4.5 ‘Irrigation density' 


The function x = h(a) =a sin2a,0<a<7/2, assumes its absolute maximum a at the 
location o = 7/4 (Figure 4.4). The function x = h(a) =h,(q) is strictly increasing in 
[0, 7a , and x = h(a) = h(a) is strictly decreasing in the interval ee al In view of this, 
for allO <x <a, 


hy lx) = 5 5 aresinZ, 


2 = h5' (=F 5 arcsin = 
Differentiation with regard to x yields 


aaa, 
dx 


_ | dar 
~ | dx 


1 


“eG 


Now (4.7) and (4.8) yield 
fx(x) = 


" 04 yl- aa " 04 yl- an 
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so that the final result is 
2 


ta, 1— (x/a)? 


This density tends to if x +a (Figure 4.5). Therefore, the outer area to be irrigat- 
ed will get more water than the area next to the nozzle. A 'fair' irrigation can only be 
achieved with varying angular speed of the nozzle. (Note that in order to be in line 
with the adequate (x,z)-system of coordinates used in this example, the roles of the 
variables x and y in formulas (4.3) and (4.7) have been taken over by a and x, 
respectively.) O 


O<x<a. 


Fx) = 


The derivation of the density fy(v) for Y= h(X) (formulas (4.3) and (4.7) was done 
in two basic steps: 


1) The distribution function F'y(y) is expressed in terms of F'y. 
2) The distribution function F'y(y) is differentiated. 


For nonmonotonic functions y = A(x) it is frequently more convenient, instead of me- 
ticulously following (4.7), to do these two steps individually, tailored to the respec- 
tive problem. This will be illustrated by the following example. 


A 


Figure 4.6 Parabola y = x? 


Example 4.5 X has both distribution function and density F'y(x) and fy(x) in the 
range Ry =(—«, +). The density of Y=? is to be determined. 


The parabola y = x” assumes its absolute minimum at x = 0 so that it is clearly not a 
monotonic function. The random event 'Y < yg' happens if and only if (Figure 4.6) 


— yo <XS+ 9 - 


Hence, Fy(y)=P(—_ Jy <X<+_/y) so that, by equation (2.5), page 42, 
FrQ)= Fx) Fa): 


Differentiation yields 
1 
ILO) SF 


7 RUA) -SeO WW), Osy<m, 
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In particular, for an N(0, 1)- distributed random variable X, the density of Y =X? is 


mse! 1_ 92, 1 ,52}__1 


219 | (2x (20 2ny 


This is the density of a y2-distribution (chi-square distribution) with one degree of 
freedom. O 


e2, O<y<o, 


Note A random variable X has a chi-square distribution with n degrees of freedom (or, equiva- 
lently, with parameter n) if it has density 


Px) = 


a] 
2 e%2) Q<x<o, n=1,2,..., (4.9) 


1 
———_ * 
2"2T(n/2) 
where the Gamma function I'(-) is defined by formula (2.75), page 75. 


Mean Value of Y According to formula (2.51), the mean value E(Y) of a random 
vari- able Y with density fy(y) is 


E(Y)=J pq, vfrv) dy. 
If Y has structure Y = A(X) with a strictly monotone function y = h(x), then, by (4.3), 


E(Y) = Jp vf |(v))] S| dy. 


dx 
dy 
Substituting y = A(x) and x = h7!(y), respectively, yields 

E(X) = Jp, h(a) fix(x) de. (4.10) 
Hence, knowledge of fy(v) is not necessary for obtaining E(Y). We already made use 
of this in chapters 2 and 3 when determining moments, variance, and other parameters. 
Continuation of Example 4.3 The mean kinetic energy E(Y) of the body has to be 
calculated by formula (4.10). Since the density of X is 

Fx) = = 2.5, O.8Sx< 1.2, 

the mean kinetic energy is 


1.2 
BY) = B( dmx?) = 4m B(x?) =4 mf)? x22.5 ae 


_ x 12 195 3 aa 
=1.25m| >] = 235 m[1.23 -0.83 |= 0.506 m. o 


Continuation of Example 4.4 The mean x-coordinate of the random landing point 
X=asin2a of a drop of water will be calculated by formula (4.10): Since the density 
of & is given by (4.8), 

n/2 


/2 
E(X) =a [ (sin 2a) 2 da = 2a) _} cos 2a |" = 2a/n = 0.6366. O 
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4.1.2 Random Numbers 


Computers, even scientific calculators, are equipped with software for the generation 
of random numbers, i.e., a computer can randomly pick numbers from the interval 
[0,1]. More exactly, a computer can generate or simulate arbitrarily frequently and 
independently of each other realizations of a random variable X, which has a uniform 
distribution in the interval [0,1]. The result of m successive, independent simulations 
is a set of numbers 

{X1,X2,...,Xn}, x; € [0, 1]. (4.11) 


This set is called a sequence of random numbers or, more precisely, a sequence of 
random numbers generated from a [0,1]-uniform distribution. In applications, how- 
ever, one will only in rare cases directly need random numbers simulated from a uni- 
form distribution. Hence the following problem needs to be solved: 


Problem Let X have a uniform distribution in the interval [0,1]. Does there exist a 
function y=hA(x),0<x<1, with property that the random variable Y=h(X) has a 
desired distribution function F(y)? 

By asuumption, the distribution function of X is 


0 for x<l, 
Fy(x) = x for 0<x<1, (4.12) 
1 for x>l. 


The function, which solves the problem, is simply h = F~!, where F~! is the inverse 
function of F, i.e. F~!(F(y)) =y for all y € Ry. This can be seen as follows: 
For Y= F7!(X), taking into account (4.12), 

P(Y Sy) = P(F'(X) Sy) = PXS FQ) = FF) = FO). 
Thus, Y= F~!(X) has indeed the desired distribution function F'y(y) = F(y). This re- 
sult is summarized in the following theorem (compare to formula (4.1)): 


Theorem 4.1 LetX be a uniformly in [0, 1] distributed random variable with distribu- 
tion function F'y(x), and F(y) be a strictly monotone, but otherwise arbitrary distribu- 
function. Then the random variable Y = F~!(X) has distribution function 


Fy(y) = FO). 


Vice versa, if X is a random variable with distribution function F'y(x), then Y= F'y(X) 
has a uniform distribution in [0, 1]. a 


Now it is obvious, how to generate from the sequence of random numbers (4.11), 
simulated from a [0, 1]-uniform distribution, a sequence of random numbers, which 
is simulated from a probability distribution given by F'y(y) : 


{V15)255¥n} with y;=F1(,;), i=1,2,...,n. (4.13) 
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The set of numbers (4.13) will be called simply a sequence of random numbers from 
a probability distribution given by Fy(y). If, for instance, Fy(y) is the distribution 
function of a Weibull distributed random variable, then (4.13) is called a sequence of 
Weibull distributed random numbers; analogously, there are sequences of normally 
distributed random numbers and so on. 


Of course, these numbers are not random at all, but are realizations of a random varia- 
ble Y with distribution function Fy(y). More precisely: The sequence (4.13) of real 
numbers y1,V2,...,¥n 1S generated by the outcomes of n independent repetitions of a 
random experiment with random outcome Y. 


In the literature, the terminology 'to simulate a sequence of random numbers from a 
given distribution’ is used equivalently to 'simulate a random variable with a given 
probability distribution’, e.g., to 'simulate an exponenially distributed random varia- 
ble' or to 'simulate a normally distributed random variable'. 


Example 4.6 Based on a random variable X, which has a uniform distribution in the 
interval [0,1], a random variable Y is to be generated, which has an exponential dis- 
tribution with parameter A : 


F(y)=P(Y<y)=1-e, p20. 
First, the equation x = 1 — e~*” has to be solved for y: 


yor" G)= + In(1 x) O<x<1. 


Hence, the random variable 
Y=F-|(X) =-+ In(1—X) 


has an exponential distribution with parameter 1. Thus, if the sequence (4.11) of uni- 
formly in [0, 1]-distributed random numbers is given, the corresponding sequence of 
exponentially with parameter A distributed random numbers is 


{V1V2.-0¥nt, 
In(i —x,), i=1,2,...,n. O 


where y; =F !(x,) = + 
It is not always possible to find an explicit formula for the inverse function F~! of F. 
For instance, if F(y) is the distribution function of a normal distribution with parame- 
ters p and o”, then the equation 


_(u= py)? 
x=FQ)=—L- Je 207 du 
V2 o -% 
cannot explicitely solved for y. However, given the x;, the numerical calculation of 
the corresponding y,, i.e., the numerical calculation of a sequence of normally distri- 
buted random numbers, is no problem at all. 
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Generalization Let Y and Z be two random variables with strictly monotone distribu- 
tion functions F'y(v) and F'7(z), respectively. Is there a function z = h(y) so that 


Z=h(Y)? 
This function can be derived by twofold application of theorem 4.1: According to this 
theorem, the random variable X= Fy(Y) has a uniform distribution in [0,1]. Hence, 
again by this theorem, the random variable F e (X) has distribution function Fy so 
that the desired function z = A(y) is 
z= F7'(Fy()). 

Thus, if Z=F a (Fy(Y)), then Y has distribution function Fy, and Z has distribution 
function F'7(z). 


Example 4.7 Let Y and Z be two random variables with distribution functions 
Fy)=1-e%, y20, and F7(z)= Jz, 0<z< 1. 
For which function z = h(y) is Z= h(Y)? 
The random variable 
X=Fy(Y)=1-e7" 
with realizations x, 0 <x < 1, is uniformly distributed in [0,1]. Moreover, 
F ey (x) =x?. 
Hence, the desired function is 
z=h(y)=(1-e”)? , y20, 
so that there is the following relationship between Y and Z: 


er a ie o 


Discrete Random Variables Sequences of random numbers of type (4.11), simulat- 
ed from a uniform distribution in [0,1], can also be used to simulate sequences of 
random numbers from discrete random variables. 
For instance, if Y is a random variable with range Ry = {-3,—-1,+1,+3} and probabi- 
lity distribution 

{P(Y=-3) =0.2, PY=-1)=0.1, P(VY=+1) = 0.4, P(Y= +3) = 0.3}, 
then sequences of random numbers from this probability distribution can be simulat- 
ed from a random variable X, which has a uniform distribution in [0,1], as follows: 


—3 for 0.0< X<0.2, 
-1 for 0.2<X<0.3, 
+1 for 0.3<X<0.7, 
+3 for 0.7<X< 1.0. 
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This representation of Y is not unique, since the assignment of subintervals of [0,1] 
to the values of Y only requires that the length of subintervals correspond to the res- 
pective probabilities. So, another, equivalent representation of Y would be, e.g., 


—3 for 0.8<X<0.2, 
-1 for 0.7<X<0.8, 
+1 for 0.0<X<0.4, 
+3 for 0.4<X<0.7. 


The method of simulating sequences of random numbers from a given distribution 
based on sequences of uniformly in [0,1]-distributed random numbers is, for obvious 
reasons, called the inverse transformation method. There are a couple of other simu- 
lation techniques for generating sequences of random numbers, e,.g. the failure or 
hazard rate method and the rejection method. They do, however, not fit in the frame- 
work of section 4.1. 


One question still needs to be answered: How are sequences of random numbers from 
a [0,1 ]-uniform distribution generated? 


It can be done manually by repeating a Laplace random experiment (page 12) with 
outcomes 0,1,...,9 several times. For instance, 10 balls, with respective numbers 0, 1, 
...,9 attached to them, are put into a bowl. A ball is randomly selected. Its number i, 
is the first decimal. The ball is returned to the bowl. After shaking it, a second ball is 
randomly drawn from the bowl; its number i, is the second decimal, and so on. 
When having done this m-times, the number 

OF i5eay 
has been generated. After having repeated this procedure n times, a sequence of n in 
[0,1] uniformly distributed random numbers has been simulated. Or, by repeating the 
Laplace experiment ‘flipping a coin' with outcomes '1' (head) or '0' (tail) m times, one 
obtains a binary number with m digits. Decades ago, researchers would obtain [0,1]- 
uniformly distributed sequences of random numbers from voluminous tables of ran- 
dom numbers. 
Note In what follows, the attribute '[0,1]-uniform(ly)' will be omitted. 


But how are nowadays sequences of random numbers generated by a computer? The 
answer is quite surprising: Usually by deterministic algorithms. From the numerical 
point of view, these algorithms are most efficient. But they only yield sequences of 
pseudo-random numbers. Extensive statistical tests, however, have established that 
sequences of pseudo-random numbers, when properly generated, have the same statis- 
tical properties as sequences of (genuine) random numbers, i.e., sequences of pseudo- 
random numbers and sequences random numbers cannot be distinguished from each 
other. 


There are three basic properties, which any sequences of (pseudo-) random numbers 
X1,X2,...%n for sufficiently large n must fulfill: 
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1) The x1,x2,...,Xn are in [0,1] uniformly distributed in the sense that every subinter- 
val of [0,1] of the same length contains about the same number of x;. 

2) Within the sequence x1,x2,...,Xn no dependencies can be found. In particular, the 
structure of any subsequence (denoted as ss) of x1,x2,...,Xn does not contain any in- 
formation on any other subsequence of x1,x2,...,Xn, Which is disjoint to ss. 

3) The sequence x,X9,...,Xn is not periodic, i.e., there is no positive integer p with 
property that there exists an element x,» of this sequence with xp =x, and after xp the 
numbers develop in the same way as from the start, i.e., 


X15X25°° Xp =X1,X pt] =X2,Xpt+2 =X35+-5X2p =X15°°° 
In this case, the sequence x1,x2,...,%» would consist of identical subsequences of 
length p (only the last one is likely to be shorter). 


Congruence Method This method is probably mostly used by random number gene- 
rators (of computers) to produce sequences of pseudo-random numbers. 
Starting with a nonnegative integer z; (the seed) a sequence of pseudo-random num- 
bers x1,%2,... with 
x; =z;/m, i=1,2,... (4.14) 
is generated as follows: 
Zis1 =(az;+b)modm, i=1,2,... (4.15) 


with integers a, b, and m, which in this order are called factor, increment, and module, 
a>0,b>0,m>0. 


Note The relation z=ymodm (read: z is equal to y modulo m) between three numbers z, y, 
and m means that z is the remainder, which is left after the division of y by m. 


Each of the figures z; generated by (4.15) is an element of the set {0,1,...,m-— 1}. 
Thus, the sequence {z;} must have a finite period p with p<m, Therefore, the 
algorithm has to start with an m as large as possible or necessary, respectively, so that 
with re- gard to the respective application a sufficiently large sequence of random 
numbers has been generated before the sequence reaches length p. The specialized 
literature gives recommendations how to select the parameters a, b, and z; to make 
sure that the generated sequences of pseudo-random numbers have the properties 1 
to 3 listed above. 


If b =0, then the algorithm is called the multiplicative congruence method, and for 
b> 0 it is called the linear congruence method. 


Example 4.8 Let a=21, b= 53, m=256, and z; = 101. The corresponding recur- 
sive equations (4.15) are 
Zi) = (212;+53)mod256, i=1,2,.... (4.16) 


The first seven equations are 
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29 =(21- 101 +53) mod 256 = 2174 mod 256 = 126, 
23 =(21- 126 + 53) mod 256 = 2699 mod 256 = 139, 
24 =(21- 139 +53) mod 256 = 2972 mod 256 = 156, 
z5 =(21- 156 +53) mod 256 = 3329 mod 256 = 1, 
26 =(21-1+53)mod256 = 74 mod 256 = 74, 

27 = (21-74 +53) mod 256 = 1607 mod 256 = 71, 
zg = (21-71 +53) mod 256 = 1544 mod 256 = 8. 


The corresponding first eight numbers in the sequence of pseudo-random numbers 
calculated by x; = z;/256 are 

x; = 0.39453; x2 =0.49219; x3 =0.54297; x4 = 0.60938; 

x5 =0.00391; x6 =0.28906; x7 =0.27734; xg =0.03125. 


Of course, with a sequence of eight pseudo-random numbers one cannot confirm that 
the sequence generated by (4.16) and (4.14) satisfies the three basic properties above. 
This example and the following one can only explain the calculation steps. oO 


Mid-Square Method From a 2k- figure integer z; one generates the subsequent fig- 
ure z;,, by identifying it with the middle 2k figures of ae If ze has less than 2k 
figures, then the missing ones will be replaced with 0 at the front of ae The figure z; 
yields the decimals of the pseudo-random number x; after the point. The specialized 
literature gives hints how to select z; and k so that the generated sequence of 
pseudo-random numbers x1,x,...,Xn fulfills the basic properties | to 3 listed above. 


Example 4.9 Let k=2 and z; = 4567. The first 7 numbers of the corresponding se- 
quences {z;} and {x;} are 

Zz, =4567 zt = 20857489 = x, = 0.4567 

Z2 = 8574 z = 73513476 x7 =0.8574 

z3 =5134 z = 26357956 =x3 = 0.5134 

z4 = 3579 zi = 12809241 x4 =0.3579 

z5 = 8092 z = 65480464 x5 =0.8092 

Z6 = 4804 ry = 23078416 xg =0.4804 

z7 = 0784 z3 = 00614656 x7 =0.0784 


It is obvious that after sufficiently many steps one must return to an x; already obtain- 
ed before. This is because the total number of 4-figure integers is 10000. Hence, with 
regard to this example, the generated sequence x1,x2,... of pseudo-random numbers 
must have a period p not exceeding 10 000. oO 
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The generation of random numbers is the basis for computer-aided modelling (simu- 
lation) of complex stochastic systems in industry, economy, military, science, huma- 
nity, or other areas in order to determine properties or relevant parameters of these 
systems. Such properties/parameters are, for instance, productivity, stability, availa- 
bility, safety, efficiency criteria, mean values, variances, state probabilities, .... By 
computer-aided simulation, systems can be qualitatively and quantitatively evaluated, 
which in view of their complexity or lack of input data and other information cannot 
be analyzed by only using analytical methods. Simulation considerably reduces costly 
and time consuming experiments, which otherwise have to be carried out under real- 
life conditions. The application of computer-aided simulation is facilitated by special 
software packages. 


4.2 FUNCTIONS OF SEVERAL RANDOM VARIABLES 


4.2.1 Introduction 


A rectangle with side lengths a and b has the area A = ab. In view of random meas- 
urement errors one has only the random side lengths X and Y, which give for A the 
random estimate A =XY. If this rectangle is the base of a cylinder with random 
height Z, then a random estimate of its volume is V is V=AZ=XYZ. 


If instead of the exact values of voltage V and resistance R in view of random fluctua- 
tions only the random values V and R are given and if the conditions for Ohm's law 
are fulfilled, then instead of the exact value of the corresponding amperage J = V/R, 
one has only the random estimate 1= VIR. 


Has an investor per year the random profits (losses) from shares, bonds, and funds_X, 
Y, and Z, respectively, then her/his annual total profit (loss) will be P=X+ Y+Z. 


If the signal sin Y with random Y has been sent and will have its its amplitude (= 1) 
randomly distorted to X during transmission, then the receiver obtains the message 


Xsin Y. 


Consists a system of two subsystems with respective random lifetimes X and Y and 
fails it as soon as the first subsystem fails, then its lifetime is min(X, Y). If this system 
only fails if when both subsystems are down, then its lifetime is max(X, Y). These are 
examples for functions of two or more random variables which motivate the subject 
of the rest of this chapter. 


The following sections 4.2.2 to 4.2.6 essentially deal with functions of two random 
variables Z=h(X, Y). If the generalization to functions of an arbitrary number of 
random variables Z=/A(X,,X2,....Xn) 1s straightforward, then the corresponding 
results will be given. This is usually only then the case when the X1,X2,...,Xn are 
independent. 
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4.2.2 Mean Value 


The random vector (X, Y) have the joint density fy y(x,y) and range Ry y given by 
the normal region with regard to the x-axis 


Ryy={@y); aSx Sb, yi(x) Sy Sy2(x)} 
(Figure 3.1, page 123). Let z=A(x,y) be a function on Ryy and Z=h(X, Y). Then, 
by formula (3.59), the mean value of Z, provided its existence, is defined as 


E(Z) = [° ee h(x, y) fx yy) dydx. (4.17) 


Since outside of Ry y the joint density is 0, it is not wrong to write this mean value as 
+00 f-+00 
BZ) = J J Wey) fxvy) dyd. 
For the calculation of E(Z) this formula may not help very much, since in each case 
the bounds prescribed by Ry y have to be inserted. 


If the random variables X and Y are discrete with respective ranges Ry = {xo, X1,..-}, 
Ry = {yo, V1,--.}, and joint distribution 


{rij = P(X=xj,Y=y;; i,j=0,1,...}, 


then E(Z) = xo D0 AC eV res (4.18) 


Example 4.10 The random vector (X, Y) has a uniform distribution in the rectangle 
Ryy={0<x<1,0<y<1}. The mean value of the random variable Z = Xsin(XY) 
has to be calculated. 


Since a rectangle is a normal region, formula (4.17) is directly applicable with 
Sx,y,y) = 1/n for all (x,y) € Ryy and h(x, y) =xsin(xy): 


E(Z) = ie ie x sin(xy) 4 dydx = 7 i (Jp x sin(xy) dy) dx 


=alpx ([- a | \)ax = 5 [9 x(1 —cosx)dx = g[x—sin x]. 


Hence, E(Z) = 1. oO 


Example 4.11 A target, which is positioned in the origin (0,0) of the (x, y)- coordinate 
system is subject to permanent artillery fire. The random x-coordinate X and the ran- 
dom y-coordinate Y of the impact marks of the shells are independent and identical as 
N(0,67)-distributed random variables. (The assumption E(X) = E(Y)=0 means that 
there are no systematic deviations from the target.) 


Let Z be the random distance of an impact mark to the target (origin). The aim is to 
determine the probability distribution of Z and E(Z). 
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Figure 4.7 Impact mark and polar coordinates 


By (2.81) and (3.13), the joint probability density of (X, Y) is 


1 x2 I ye 1 _ x? ty? 
Sx.) = me ae aa. 207 = ane 20? 5 — <x, y< +0, 
tO tO lO 


Since the distance of the impact mark to the target is Z= /X?+Y? , the distribution 
function of Z is principally given by 
x2 4y2 
F7(z) = P(Z<2z)= {J Le 20? dxdy. (4.19) 
210 
{Qxy), 2249? Sz} 

To facilitate the evaluation of this double integral, a transition is made to polar coor- 
dinates (special curvilinear coordinates, page 123) according to Figure 4.7: 


X=rcos@, y=rsing or r= ee +y?, = arctan = 


i Ox _ & pg a Os 
with Gr = COS, Jp = —TSIN, F. = SING, Zo = COs. 


The corresponding functional determinant is (page 123) 


A(x.) oe 

CVS Y) = Or Or 3 cosm@ sing * > : 2 

ey 7 ox Y | —rsin@ rcos@ | r(cos@)* +r(sin @)* =r. 
Op Op 


Integrating over the full circle {(x,y), ae +y? <z} in (4.19) is, in polar coordinat- 
es equivalent to integrating over the area [(0<r<z, 0<@< 27]. By (3.17), page 123, 
the integral (4.19) reduces to 


2 
2n 2 2 Z 


i) : 5e 2°’ rdodr = + 
0 


r 
eet 2 
re 2dr=1—e 20°, z>0. 


210 °o 


F(z) =] 
0 


oN 


This is a Weibull-distribution with parameters B =2 and 0 = /2 6, ie., the random 
variable Z is Rayleigh-distributed. Hence, by formula (2.78), its mean value is 


E(Z) = {2 6 (1.5) = 1.25336. oO 
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4.2.3 Product of Two Random Variables 


Let (X, Y) be a random vector with joint probability density fy y(x,y), and 
Z=XY. 
The distribution function of Z is given by 
FpQ)= |] fxry)dedy 
{@y)sxVSz} 
with (see Figure 4.8) 


{(x,y); xy Sz} = {-00 <x <0, <y<wJU{0sx<mw, -w<y< 3}. 


Hence, 
F7Q)=Jo. fe fcyenyydvdet fo” [ facy Cay) dvdr. 
Differentiation with regard to z yields the probability density of Z: 
fre) =°,, (-4) faves Dat J? 4 fry, dae. 
This representation can be simplified to 
fae) =I"? |2| fxr. 3)dx, 2 € (0,4). (4.20) 


For nonnegative X and Y, 
400 pz/ 
F7@= | Jo‘ fxv@»dvd, 220, 


fa@=\ L fyy@. Dd, 220. (4.21) 
A 
y 
z<0 
A y=i 
0 > x 


Figure 4.8 Derivation of the distribution function of a product 


Example 4.12 The random vector (X, Y) has the joint density 
fxya,y)=6x7y, OSxyS I. 


Since both X and Y are nonnegative, formula (4.21) can be applied to determine the 
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density of Z = XY: Since z/x < 1, 
frlz) =|! £ (6x? -2)dx=62(1-2), O<z<1. 


The calculation of the mean value of Z yields 
4ql 


E(Z) = [421621 -2)]dz = 6| © - =|, =1. 
The marginal distribution densities of (X, Y) are 
fx) = 3x3, O<x<1, and fy(y)=2y, O<y<l. 
Hence, fy y(x,v) =fx(x)-fy(v) so that _X and Y are independent. Oo 


4.2.4 Ratio of Two Random Variables 


Let (X, Y) be a random vector with joint probability density fy y(x,y), and 


eS 
ere 


The distribution function of Z is given by 


Fx2)= || fxy@y)dedy 
{(xy);%<z} 
with (Figure 4.9) 
{(x,»); 2 <z} = {-0o<x<0,zx<sy<olU{0<x<m, -wo<y<azr}. 
Hence 
0 0 00 [Zz 
F72)= | [F° fyy@ydvdet >” Je fey @ny) dvdr. 


Differentiation with regard to z yields the probability density of Z: 


frl2) =]*2 Ixl fry 2x) de. (4.22) 
My you 4) 
z>0 z<0 
y=ZX 
> x 0 >x 


Figure 4.9 Derivation of the distribution function of a ratio 
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In case of nonnegative X and Y, 
Fa@)=]y° \q fxresydvdx, 220, 


fa2)=\y xfxy,zx)dx, 220. (4.23) 
Example 4.13 The random vector (X, Y) has the joint density 
fxyy)=Ape CW), x>0,y20; 4>0, w>0. (4.24) 


The structure of this joint density implies that X and Y are independent and have 
exponential distributions with parameters A and u, respectively. Hence, the density 
of the ratio Z = Y/X is 


ioO=\, eine OMe, ze0. 
A slight transformation yields 


fr) = ch [e xQ+pzjeCH*de, 220. 


The integral is the mean value of an exponentially distributed random variable with 
parameter +z. Therefore, 
ALL 


Zz) = —— > 0, 4.25 
nN 
= ] —-—_—— > 
F7(z)=1 Adue 72 0 
This is the Lomax distribution (page 93). O 


Example 4.14 A system has the random lifetime (= time to failure) X. After a failure 
it is replaced with a new system. It takes Y time units to replace a failed system. Thus, 
within a (lifetime-replacement) cycle, the random fraction during which the system is 
operating, is 

aX 

—X+Y" 
A is called the availability of the system (in a cycle). Determining the distribution 
function of A can be reduced to determining the distribution function of the ratio 


Z= Y/X since 


Fy) =P <)=P(% <1) =1-P(z < Et), 


Hence, 
F4()=1-F7( 4), 0<t<l. 


Differentiation with respect to t yields the probability density of A: 


fa=4 f(t), o<es1. 
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Specifically, if the joint density of (X, Y) is given by (4.24) then /7(z) is given by 
(4.25) so that we again get a on, distribution: 
= < 
4D = ce aah Pr » FaQ= Q=W Au ara Osfsl. 
For A # u, the mean value of A is (easily obtained by formula (2.52), page 64) 


Ll x La 
E(A)= He |1+ 5 |ind In 
In particular, let A/u = 1/4. Then the probability that the system availability assumes 
a value between 0.7 and 0.9 is 


P(0.7< A <0.9) = F4(0.9)-F 4(0.7) = Sy mo ar = 0.324. 


In view of E(X) = 1/A and E(Y) = 1/u the assumption A/p = 1/4 implies that the mean 
lifetime of the system is on average four times larger than its mean replacement time. 
Hence, one would expect that the mean availability of the system is 0.75. But the true 
value is slightly lower: E(A) = 0.717. 


If A =p, then A is uniformly distributed over [0, 1]. In this case, E(A) = 1/2. O 


4.2.5 Maximum of Random Variables 


Let (X, Y) be a random vector with joint density fy y(x,y) and 
Z=max(X, Y). 


The random event 'Z < z' occurs if and only if both XY and Y assume values which do 
not exceed z. Hence (Figure 4.10), 


F7(z) =P(ZS<2)=P(XSz,Y<2)=]" | fry) dxdy. 


Figure 4.10 Integration region for the maximum 


Example 4.15 The random vector (X, Y) has a Marshall-Olkin distribution with joint 
distribution function given by (3.27): ForA; >0,A2>0,A>0, and x,y = 0, 


Fyy(x,y)=1 — eT A1tA)x _ p-Artd)y 4 phi x-Azy-A max(x,y) 
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so that 
P(Z> Zz) =|- F7(z) =] —Fyy(z,Z) = e Aith)z arts e Artz = eA tA2 +A) Z 
Hence, by formula (2.52), page 64, the mean value of Z = max(yX, Y) is 


1 i 1 
IAS ah igh Ree 


(4.26) 


As a practical application, if a system consists of two subsystems with respective life- 
times X and Y, and the systems fails when both subsystems have failed, then its mean 
lifetime is given by (4.26). In particular, in case of independent, identically distribut- 
ed lifetimes X and Y (i.e., 4 =0, A, =A): 


EQ) = 32. 


In this case, a 'spare' system increases the mean system life by the factor 1.5. oO 


Now the random variables X1,X9, ...,Xn are assumed to be independent with distribu- 
tion functions Fy, (z) = P(X; <z), i= 1,2,...,n. Let 
Z=max{X1,X9,...,Xn}. (4.27) 
Since the random event "Z < z" occurs if and only if 
'X, <2, X2 <z,...,Xn <z', 
and the events 'X; < z' are independent, the distribution function of Z is 
F7(@) =F x,@)Fy,@)*:' Fx, @): (4.28) 


Example 4.16 A system consists of 1 subsystems s 1, 59,...,5n. All of them start oper- 
ating at time point ¢=0 and fail independently of each other. The system operates as 
long as at least one of its subsystems is operating. Thus, n — 1 out of the m subsystems 
are virtually spare systems. Hence, if X; denotes the lifetime of subsystem s;, then 
the lifetime Z of the system is given by (4.27) and has distribution function (4.28). In 
engineering reliability, systems like that are called parallel systems. Its failure behav- 
ior is illustrated by Figure 4.11. Each of the n edges in the graph with parallel edges 
depicted there symbolizes a subsystem. The system works if and only if there is at 
least one 'operating edge', which connects entrance node en and exit node ex. 


As a special case, let us assume that the lifetimes X; are identically exponentially dis- 
tributed with parameter A : 


Fy(x)=1-e%*,0>0, i= 1,2,...,7. 


= N\ 


Figure 4.11 Illustration of parallel system 
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Then the system lifetime has distribution function F(z) =(1 —e~*7)", z> 0, so that 
the mean system lifetime is 


E(=J, [1-d-e4?)" |az. 
The substitution x = 1-e~* yields 


1 pl i 1 fl — 
E(Z) = 5 Jy ede =z fg [lteter ae. 


Hence, BZ) =F[1+4h4--43]. 


n 


Because of the divergence of the harmonic series ©? , 1/i, an arbitrary large mean sys- 
tem lifetime can be achieved by installing a sufficient number of subsystems. oO 


4.2.6 Minimum of Random Variables 


Let the random vector (X, Y) have the joint density fy y(x,y), and let Z = min(X, Y) 
have distribution function F'7(z) = P(Z<z). Then, by integrating over the hatched 
area in Figure 4.12, 


Frx2= ff fyrve.ydrdy= |", FL fev y) dxdy. 


{Q); ¥Sz, yz} 
Integrating over the non-hatched area yields 
F7(z) = P(Z>z) =P(X>z,Y>z) =|" |” f(x,y) deay. 
For independent X and Y, 
F7(z) = F x(z)- Fy). 


Figure 4.12 Integration region for the minimum 


Example 4.17 A system consists of two subsystems with respective lifetimes X and Y. 
The system fails as soon as the first subsystem fails. Then Z = min(X, Y) is the mean 
lifetime of the system. Let, for instance, the random vector (X, Y) have the Gumbel- 
distribution (3.28) with parameters 1; = 12 = 1 and parameter A, 0 <2 <1. Then, 


F(z) = P(Z>2z)= ete hz? z20, 
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E(Z) 
0.50 
0.46 
0.424 
0.38 


L | 1 
0 02 04 06 08 1> * 


Figure 4.13 Decrease of the mean lifetime for’ —> 1 


and, by formula (2.52), the mean lifetime is 
BZ) = JZ «O22, 


Figure 4.13 shows the graph of the mean lifetime depending on 2. With increasing 
dependence between X and Y (A > 1), the mean lifetime decreases almost linearly 
from 0.5 (independence) to about 0.38. (The correlation coefficient between XY and Y 
is given at page 138.) O 


Now let X1,X9,...,.Xn be independent random variables and 
Z=min{X1,X9,....Xn}. 
Then, P(Z > x)= P(X >z, X2 > 2Z,..., Xn > Z) so that 
Fz(z)=P(Z>z)=Fy,(2)-F yx, (2): Fx, (2). (4.29) 
Thus, the distribution function of the minimum of n independent random variables is 
Fz7(z)=P(Z<z)=1 —Fy,(z) ‘Fy, (2) oo ‘Fy, (2). (4.30) 


Generalizing example 4.17, if a system, consisting of n independently operating sub- 
systems 51,57,...,5n, Starts operating at time z = 0 and fails as soon as one of its sub- 
systems fails, then its survival function is given by (4.29). In Figure 4.14, if the chain 
between entrance node en and exit node ex of the graph is interrupted by a failed sub- 
system, then the system as a whole fails. In reliability engineering, systems like this 
are called series systems. If, in particular, the lifetimes of the subsystems are identic- 
ally exponentially distributed with parameter 1, then Fz(z) =e“, z>0, and the 
corresponding mean system lifetime is E(Z) = 1/An. Every installation of another sub- 
system decreases both the survival probablity and the mean lifetime of a series system. 
For instance, if one subsystem survives the interval [0,1] with probability e* = 0.99, 

then 100 of such subsystems in series survive this interval only with probability 

0.99100 ~ 0.37. Therefore, in technological designs, combinations of parallel and 
series systems are preferred. oO 


S] SQ Sn 
en @——e®—_@ --- @—__® ex 


Figure 4.14 Illustration of a series system 
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4.3 SUMS OF RANDOM VARIABLES 


4.3.1 Sums of Discrete Random Variables 


Mean Value of a Sum The random vector (X, Y) with discrete components X and Y 
has the joint distribution 


{rij = P(X =xi Va yj; i, 7 =0,1,...}, 


and the marginal distributions 
pi=PX =x) =Leory: 
qj = PV =y;) = Leo ry- 
Then the mean value of the sum Z = X+ Y is 
E(Z) = Dien Vj-oi + Yi 
= Leo xd j-0 rig + Lo Ypdj-0 rij 


= Lio xiPit+ LLoVsq- 


Thus, 
E(X+ Y) = E(X)+ EQ). (4.31) 
By induction, for any discrete random variables X1,X9,...,Xn, 
E(X, + Xo +++++Xn) = E(X1) + E(X) +--+ + E(Xn). (4.32) 


Distribution of aSum Let X and Y be independent random variables with common 
range R = {0, 1,...} and probability distributions 


{pj = P(X =i; i=0,1,...} and {qj =P(Y=/; j=0,1,...}. 
Then, 
P(Z=h) =P(X+ Y=h = Dig P(X =i) P(Y=k-i). 
Letting r, = P(Z=k) yields for all k= 0, 1,... 
’k=POQK* PIVk-1 + °° +PK0- 

Thus, according to formula (2.114) at page 98, the discrete probability distribution 
{rz; k=0,1,...} is the convolution of the probability distributions of X and Y. The 
z-transforms of X and Y are defined by (2.110): 

My(z) = pare 

My(2) = Li-o giz’. 
By (2.116), 

M7(z)= My(z) My(2). (4.33) 
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The z-transform M7(z) of the sum Z=X+ Y of two independent discrete 
random variables X and Y with common range R = {0,1,...} is equal to the 
product of the z-transforms of X and Y. 


By induction, if Z= X| +X2+-::+Xn with independent X;, then 
Mz(z) = Mx, (2) Mx, (2): Mx, (2). (4.34) 


Example 4.18 Let Z= X, +X. +---+Xn be a sum of independent random variables, 
where X; has a Poisson distribution with parameter 1;; i= 1,2,...,, Le., 


a 
P(X;=H= Fe, k=0,1... 


The z-transform of X; is (page 91) 
My(z) =e), (4.35) 
From (4.34), 
M7(z) = ehitagt An) (2-1) 


The functional structure of M7(z) is the same as the one of My,(z). Thus, the sum of 


independent, Poisson distributed random variables has a Poisson distribution, the 
parameter of which is the sum of the parameters of the Poisson distributions of these 
random variables. (This way of reasoning is only possible, because, as pointed out in 
section 2.5, to every probability distribution there belongs exactly one z-transform 
and vice versa.) O 


Example 4.19 Let Z= X, + X2 +---+Xn be a sum of independent random variables, 
where X; has a binomial distribution with parameters n; and p;, i= 1,2,...,n, 1.€., 


P(X; =) = @ra —p)"*, k=0,1,...51;. 
Then (page 98), the z-transform of X; is 
Mx,(z) = [piz+1—p;]”. 
Hence, the z-transform of the sum is 
Ma2)= pie il, 
Under the additional assumption that 


Pi=pP, i=1,2,...n, 
this representation of the z-transform of Z simplifies to 


Mz(z)=[pz+1—pynret™, 


Comparing this M7(z) with My (z) shows that in case of p; =p the sum Z has again 
a binomial distribution, but with parameters p and n; +17 +---+Mn. O 
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4.3.2 Sums of Continuous Random Variables 


4.3.2.1 Sum of Two Random Variables 


Distribution The random vector (X, Y) have the joint density fy y(x,y). Based on this 
information, the distribution function F'7(z) = P(Z < z) of the sum Z = X+ Y has to be 
determined. 


Figure 4.15 Integration region for the sum 


Figure 4.15 illustrates the situation: Those realizations (x,y) of (X, Y), which satisfy 
the condition x +x <z or y<z-—x, respectively, are in the hatched area. If the vector 
(X, Y) assumes such a realization, then the random event '¥+ Y<z' occurs. Hence, 
F7(z) is given by the double integral 


F7(2) =|"? |? fv.» dvdr. 


Differentiation with regard to z yields the density of Z: 


fra) = 4] FPS fe voy) dyde =] © PO pe vee.y) dvd 


—0 dz 
so that fr2) =" fx ve.z—x) de. (4.36) 


If X and Y are nonnegative, then fy y(x,y) is 0 for x <0 and/or y <0. In this case, 


only such x and z—x can contribute to the integral in (4.36), which satisfy x = 0 and 
z—x2=0. Hence, 


Fae) = |o fxy,2-x) de. (4.37) 


If X and Y are independent, then fy y(x,v) =/fx(x) -fy(y) so that in this case formulas 
(4.36) and (4.37) become 


folz) =| fo fle —») dx, (4.38) 
S22) = |) fx@) fle -%) de. (4.39) 
These integrals are the convolutions of fy and fy (formulas (2.125) and (2.126)). 


The density of the sum of two independent random variables X and Y is the 
convolution of the densities of X and Y. 
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By formula (2.127), the Laplace transform of the density of the sum of two independ- 
ent random variables X and Y is equal to the product of their Laplace transforms: 


Sas) =fxs) Sys). (4.40) 
The distribution function of Z for independent X and Y one simply gets by integrating 
the density f7(z) given by (4.38) and (4.39), respectively. A heuristic approach is the 
following one: On condition Y= y the distribution function of Z=X+ Y is 
F7(Z<2z|\Y=y)=P(X+y <z)=P(X<z-y)=F y(z-)). 
Since dF y(v) = fy(v) dy is the 'probability' of the event 'Y = y' (see comment after for- 
mula (2.50), page 61), 


Fy(z) =|" Fxe—y)fr0) dy, (4.41) 
or F7(z)=|"” Fy(z-y)dF y(). (4.42) 
For nonnegative X and Y the formulas (4.41) and (4.42) become 

F7(z)= [5 Fr@-y)fro) ay, (4.43) 

F7(z) =|5 Fx@-y)dF y(). (4.44) 


In the terminology used so far, the intergral in (4.41) is the convolution of the func- 
tions Fy and fy. The integral (4.42), however, is called the convolution of the distri- 
bution functions Fy and Fy. Of course, the roles of X and Y can be exchanged in for- 
mulas (4.36) to (4.44) since X¥+ Y= Y+X. 


Example 4.20 It is assumed that the random vector (X, Y) has a uniform distribution 
over the square [(0<x<7,0<y<T], ie. 


Pee 1/T?, O<x,y<T 
AE Ns otherwise ” 


By theorem 3.1, this assumption implies that X and Y are independent and in the inter- 
val [0, 7] uniformly distributed random variables. Hence, formula (4.39) is applicable 
for determining the density of Z=X+ Y: 


. pte OS28T 
S22) = |p fxv0.2-x) dr = ro. 


2-T 72 dx, T<z<2T 
Therefore, 

=i 0<z<T 
T 


frey= 4" | 
= (@2T-z), T<z<2T 
E 


Figure 4.16 shows the graph of f7(z). It motivates the name triangular distribution. 
But it is also called Simpson distribution. The corresponding distribution function is 
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1/T 


x 
0 T or 


Figure 4.16 Density of the triangular distribution 


12)” 0<z<T 


F2(z) = Jo.fz(@) du = 2Q-Z)-1, (<z<2T" 


The symmetry of the density with regard to x = T implies that E(Z) = T. Hence, 
E(Z) = E(X) + EQ). oO 
Example 4.21 Let the random vector (X, Y) have the joint density 
fxyy) =Ape Cm), x20, y20; A>0, p> 0. 


From example 4.13 we know that X and Y are independent and have exponential dis- 
tributions with parameters A and u, respectively. Hence, formula (4.39) is applicable 
to determine the density of the sum Z=X+ Y: 


fez) = ip Kee HED) dy =p eee e OW dy, 
Two cases have to be considered separately: 
1)A=p: fz(2) ='2ze™, z>0. 
This is an Erlang distribution with parameters 4 and n = 2 (page 75). 
AFB: fxl2) = we [ewe —e*], 220. 
The mean value of Z=X+Y is (A # p) 
AP po. oO 
E(2) =| 2fz(z)dz = reali zeMdz—[* ze™dz | 
1 
= +4 =E(X)+E(Y). oO 


Mean Value of a Sum In the previous two examples, the mean value of a sum 
proved to be equal to the sum of the mean values of the terms. This is generally true, 
whether X and Y are independent or not (but E(X) and E(Y) must be finite): 


E(X+ Y=] [ty frye») dvdx 
=] xf fev. wdvde+ fy] fey,» dedy 


=[°2x (JS srreayddy)dx+ [SJ facresy) de) dy. 
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Now, by using properties (3.11) of the joint density, 
E(X+Y)=]" x fy) dx + |" yf) dy = E(X) + E(Y). (4.45) 


The mean value of the sum of two random variables is equal to the sum of their 
mean values. 


Variance of a Sum To present the variance of the sum Z=X+Y in a convenient 
way, we need again the concept of the covariance between X and Y as defined by 
(3.37) or (3.38) (page 135): 


Cov(X, Y) = E([X- E(X)] - [Y- E())). 
By definition (2.60) of the variance, 
Var(Z) = E(Z— E(Z))* = E(X+ Y-E(X)- EX)? 
= E([X- E(X)] + [Y- E(Y)])? 
= E(X- E(X))? + 2E((Y- E(Y)] E(Y- E()]) + E(Y- EY)’. 
Hence, the variance of the sum is 


Var(X + Y) = Var(X) + 2Cov(X, Y) + Var(Y). (4.46) 
If X and Y are independent, then Cov(X, Y) = 0. In this case, 
Var(X + Y) = Var(X) + Var(). (4.47) 


The variance of the sum of two independent random variables is equal to the sum 
of their variances. 


Bivariate Normal Distribution Let the random vector (X, Y) have a bivariate normal 
distribution with parameters 


Ux, Hy, Ox, Oy, and p> —%o<Uy,Hy<%, Ox >0,0,>0, -—l1<p<l. 
Then (x, Y) has the joint density (page 131) 


1 L(G? 4 GO) | Oy)? ) | 
= 2 . 
Fx.¥y) 2nox0y [1-p? oo 2(1-p2) \ ox Po oxoy oF 
To determine the density f7(z) of Z=X+ Y, formula (4.36) has to be applied. Letting 
u=xX—Ux and v=Z—[x— py 
yields f7(z) in the form 
+00 
7 1 ree u(v-u) | (vu)? ) 
S22) = 7 2 exp (1p?) eg? 2p Ox0y + oF ) fa 


2m0x0y,| |-p 


The following transformation in the integrand of this formula requires some routine 
effort, but will prove to be advantageous: 
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2 2 
u2 uv—u) (v-u)?  Ox+2poxoy+oy 4 LOxtpoy 15 
= Un = 2, uv+—v 
2 OxOy 2 22 2 2 
Ox Oy OxOy OxOy Oy 


2 


2 2 
{Ox +2POxGy + OY Ox + poy 1-p? 5 
Vv vr. 


OxOy 2 2 
Oy [ox +2poxoy +05 Ox + 2poxoy + Oy 


Now this expression is inserted into the integrand and after having done this the fol- 
lowing substitution is done: 


[2 2 
1 Ox + 2P0xGy + Oy Ox +poy 


pol a ~ on eo 
I-p Oy | Ox +2POxOy + OY 


These transformations result in the following form for f7(z): 


( 2 \ +20 
f2(z) = l exp y j edt. 
2 2 62 +2 2) } 0 
2n,| Ox +2poxOy + OY (Ox +2poxoy + oy) 


Since hare edt = Jn, the final result is 


( Ste aaNet * 4 | 
f2(2) = u exp [ . Hs = Hy) ; ; , ~a<z<0, (4.48) 
[2m (0% +2poxoy +07) 2(ox + 2poxGy + Oy) 


Comparing f7(z) with the density (2.81) of the one-dimensional normal distribution 
verifies the following corollary from (4.48): 


If the random vector (X, Y) has a two-dimensional normal distribution with 
parameters 

Ux, Hy, Ox, Sy, and p; —2< [x,y <0, 6y>0,0,>0, -1<p<l, 
then the sum Z=X-+ Y has a one-dimensional normal distribution with parameters 


E(Z)=px+py and Var(Z) = 0% + 2poxoy +04. (4.49) 


The Laplace transform of any M(1,07) distributed random variable is, by formula 
(2.129), page 102, 
f(s) = oo bstzors? 

If X and Y are independent, then the Laplace transform of Z is the product of the Lap- 
lace transforms of X and Y: 

fxs) = oo bsst yor? : oo byst pops" _ oo (atly) s+5(ox+o5) 5° 
This proves once more that the sum Z = X + Y of two independent, normally distribut- 
ed random variables X and Y is normally distributed with parameters 


E(Z) = Ux + py and Var(Z) = of+ oF, Le. Z= N(Ux + py, of+ G3). (4.50) 
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Example 4.22 Let X and Y be the annual profits Bobo makes from her investments in 
equities and bonds, respectively. She has analyzed her profits over a couple of years, 
and knows that the random vector (X, Y) has a bivariate normal distribution with 
parameters (in $, influence of inflation eliminated) 


Hx = 2160, py = 3420, ox = 1830, oy = 2840, and p = —0.28. 
(1) What probability distribution has Bobo's total profit Z = X+ Y? 
(2) What is the probability that her total 'profit' is actually negative? 


(1) According to (4.46), Z has a normal distribution with parameters 
Lz = 5580, 6? = 0% + 2paxoy + oF = 8 504.068 
so that oz = 2916. 


(2) PZ < 0) = P{ E3580 < -$580) ~ @(-1.91) ~ 0.028, Oo 


Continuation of Example 3.7 (page 131) The daily consumptions of tap water X 
and Y of two neighboring towns have a bivariate normal distribution with parameters 


bx = My = 16[103 m3], ox =o) =2[103m3], and p =0.5. 
What is the probability that the total daily tap water consumption Z = X+ Y of the two 
towns exceeds the amount of 36 [10° m2], which is the maximal amount manageable 
by the municipality? 
Z has a normal distribution with parameters 
lz = 32 [103 m>| and of = on + 2poxoy +05 =12 [10° m®| 
so that oz ~ 3.464. Hence, 


Z—32 , 36-32 
3.464 3.464 


P(Z> 36) = P( ) = 0 1.155) = 0.124. Oo 


4.3.2.2 Sum of n>2 Random Variables 


In this section, X;; i= 1,2,...,n; are random variables with respective distribution 
functions, densities, mean values, and variances 


Fix), fix, i= EX), and o? = Var(X)); i=1,2,...,n. 


The joint density of X =(X],Xo,...,Xn) is denoted as fx(x1,X2,...,Xn). All mean 
values and variances are assumed to be finite. The covariance between X; and X; is 
according to (3.37) defined as 

Cow(X;, Xj) = EX; — E(X)) [Xj — EG). 
The sum of the X; is again denoted as Z=X,+X2+---+ Xn, and its distribution 
function and density as F'7(z) and f7(z). 
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Mean Value of a Sum 


E(Z) = E(X, +Xo4+---+ Xn) = E(X1) + F(X) +--+ E(Xn). (4.51) 


The mean value of the sum of n (discrete or continuous) random variables 
is equal to the sum of the mean values of these random variables. 


This can be proved analogously to formula (4.45) by making use of the relationship 
(3.54) between fx and the fy, or simply by induction starting with formula (4.45): 
If, for instance, the mean value E(X, +.X2 +X3) has to be determined, let 
X=X,+X, and Y=X3 
and apply (4.45) as follows: 
E(X, +X +X3) = E(X) + E(Y) 
= E(X) + Xp) + E(X3) 
= E(X1) + E(X2) + E(X3). 


Variance of a Sum The variance of the sum Z= Yj, X; of n random variables X; 
results from its representation as 


Var(Z) = E(Z— E(Z))? = E((X — E(X1)] + [Xp — E(Xo)] + + Xn — (Xn). 


Since 
Cov (X;,X;) = Var(X;) and Cov(X;,X;) = Cov (X;,X;j), 
the generalization of formula (4.46) is 
Vary X;) = Diet Var( Xj) +2 Dj etigj Cov (Xj, Xj). (4.52) 
Thus, for uncorrelated X;, 
Var(X, +X2 +-+++ Xn) = Var(X,) + Var(X)+--- + Var(Xn). (4.53) 


The variance of a sum of uncorrelated random variables is equal to the sum 
of the variances of these random variables. 


Let @1,07,---, Qn be any sequence of finite real numbers. Then, by (2.54) and (2.61), 
B(SL1 a; X;) = E21 a EX), (4.54) 
2 
Var Dy 0:X;) = D1 oj Var(X;) +2 Din, ij &1 4; Cov(X;, Xj). (4.55) 
Ifthe X; are uncorrelated, the latter formula simplifies to 


Var( SI) @;X;) =D 02 Var(X)). (4.56) 
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Now let us interpret a sequence {X1,X9,...,Xn} of independent, identically as _X dis- 
tributed random variables as a random sample taken from X, i.e., a random experiment 
with outcome X is repeated n times. Mean value and variance of X and, hence, of all 
the X; are E(X) = p and Var(X) = o*. Then formulas (4.54) and (4.56) simplify to 


(EL X)=nn, Var(S2, X;) =n0?. (4.57) 
Under the same assumptions, application of (4.54) and (4.56) to the arithmetic mean 
X=5 DX 
yields with a; = 1/n 
E(X) = and Var(X) = os (4.58) 


Note Formulas (4.51) to (4.58) hold both for discrete and continuous random variables. 


Definition 4.1 A function 6 = 6(X) ,X2,...,Xn) of asample {X1, Xo, ...,.Xn} taken from 
a random variable X is called an unbiased estimator of a parameter 0 of X if 


E(6) = 9. e 


Parameters can, e.g., be 0 = 1. = E(X), 0 = 07 = Var(X), or 0 =f in case of the beta 
or Weibull distribution. The left formula of (4.58) shows that 6 =X is an unbiased 
estimator of @ = u = E(X). Verbally, when estimating the mean value of X by X, only 
random deviations of X from ,1 = E(X) can be observed, no systematic ones. In addi- 
tion, the right formula in (4.58) shows that with increasing number of measurements 
the accuracy of X as estimator for 1 improves since Var(X) tends to 0 if n > . 


After having done the 7 repetitions of the random experiment, a sequence of real 
numbers {x1,X2,...,Xxn} has been obtained, i.e., X¥;=x;; i= 1,2,...,n. This sequence 
gives empirical estimators for and 07: 


n 


==] Lyn = 
Fao itp 8 = ii 


n 
Now, as announced after formula (3.48), page 143, we are in a position to justify the 
factor in the formula for s2. 


Theorem 4.2 Let {X,X,....Xn} be a random sample from a random variable X with 
0<o0?= Var(X) <0. Then the random sample function 
1 = 
S? = DXi -X)? 


is an unbiased estimator of o? = Var(X). 


Proof We have to prove E(S”) = 0”. For this reason, S? is written in the form 


v) =p 
Seat yi x7-- Xx. (4.59) 


nT 
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In what follows, use will be made of the independence of the X;, and their identical 
distribution as X: 


E(X; Xj) = E(X;) - E(X)) = (EQ)? for i #j. 


Then 
£(S1, X?) =n BX?) (4.60) 
so that only the second moment of X has to be determined: 
32, 1 eo 
E(X’)= “(Eh x = “Eh x) 


( \ 
1 2 1 
= teen, x2) + “ae Ett X; “1 
n n ivf 
_1 2 n-1 2 
Substituting this result and (4.60) into (4.59) gives 
E(S?) = 07. 7 

Distribution of a Sum The density of the sum Z =X, + X72 +---+Xn of n independ- 
ent, continuous random variables X; is obtained by repeated application of (4.36), 


page 181. To do this in an efficient way, next the convolution symbol '*' will be intro- 
duced: For any two integrable functions f and g, their convolution is denoted as 


fe ge) = |" fe—xg) dx = J" a(z—) fx) dx = g* fl). (4.61) 
Thus, the convolution product is commutative, i.e. 


f*2g(Z)= g*f(), 


just as the product of two real numbers: a-b=b-a. 


The convolution of the densities fy,, fy,,-.., fy, 18 obtained by repeated application 
of (4.61): Firstly, fy, *fx, 1s calculated. Then the convolution of fy, with fy, *fx, 
is determined to obtain fy, * fx, *fy, and so on. The final result is the probability 
density of Z: 


S22) = fx, * fx, * ++ * hy, (2)- (4.62) 


In particular, if the X; are identically distributed with density f, then f7 is the n-fold 
convolution of f with itself or, equivalently, the nth convolution power f*™(z) of f. 


f*(2) can be recursively obtained as follows: 
fO@ = [TF ME-D fOr, (4.63) 
1=2,3,....03 (OG) =f(). 
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For nonnegative random variables X;, this formula becomes 
fO® =[Hf£VE-vsf@) dx, 220. (4.64) 


From (4.40), by induction: The Laplace transform of the density f7 of the sum of n 
independent random variables Z=X, +X 7+---+ Xn is equal to the product of the 
Laplace transforms of these random variables: 


L( fz) = Lx, )Lfx,) ++ LOfx,,)- (4.65) 
The convolution of the distribution functions Fy, and F’y, is defined by (4.42) as 
Fy, *Fx,@) =|" Fx,@-y)dFx,0). (4.66) 


The repeated application of (4.66) yields the distribution function of the sum Z of the 
n independent random variables X1,X>,..., Xn in the form 


F7(z) = Fx, * Fy, eek Fy (Zz). (4.67) 


In particular, if the X; are independent and identically distributed with distribution 
function F, then F'z(z) is equal to the nth convolution power of F: 


F7(z) = F* (2). (4.68) 
F7(z) can be recursively obtained from 
F*OG@)=[ F*DE—-x dF); (4.69) 
n=2,3,..5 F*O@) =1, F*0(@) = F(x). 
If the X; are nonnegative, then (4.69) becomes 
F*O() = [) F*EY @-x) dF). (4.70) 


The convolution powers of any order 7 can explicitely be given for the Erlang distri- 
bution and for the normal distribution. 


Erlang Distribution Let the random variables X, and X> be independent and expo- 
nentially distributed with parameters 1, and 2: 


fx =aje™i*, 
Fy(x)=1-e*; x20, i= 1,2. 
Formula (4.37) yields the density of Z=X, +X: 
Frl2) = [5 Age MEY Ae de 
= Ay Rge B22? eOrPad dy. 


At this stage, two cases have to be treated separately: 
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IA, =AQ =A: Praja ze 220. (4.71) 
This is the density of an Erlang distribution with parameters n = 2 and A (page 75). 
2), #Ad: 


a ANF 5, -r 
fz x =z (e 22 _e 12), z=0. 


Now let X1,X,...,Xn be independent, identically distributed exponential random 
variables with density f(x) =2e7**; x= 0. ve Laplace transform of f is (page 101) 


f(s) = st — 
Hence, by (4.65), the Laplace transform of the density of Z=X, +X) +--:+Xn is 
n 
f2()=(AL)". 


The pre-image of this Laplace transform is 


fr) = oe ie dz, z>0, 


(Verify this by calculating the Laplace transform of f7(z).) This is the density of an 
Erlang distribution with parameters n and i. Hence, the density of an Erlang distribu- 
tion with parameters n and A is the nth convolution power of the density of an expo- 
nential distribution f(x) = %.e~**, which is an Erlang distribution with the parameters 
n=1 andi. 


Normal Distribution Let XY; and Xz be two independent, normally distributed ran- 
dom variables: X; = N(u1, 51), X27 = Mur, 55). Then we know from formula (4.50) 
that Z =X, + X> is normally distributed with parameters 1 + 2 and of + 35 ; 
Z=Nuyt+ 112,04 + 55). 
By induction: the sum of n independent random variables X; = M(p;, 3?) i 
Z=X,+X72+-::+Xn, 
is normally distributed with parameters 
E(Z)= wy +lo2+-:-+Hn and Var(Z) = 07 +05 +°--+0n, 
or, more concise, 
Z= (Ej Hi, 22,02). (4.72) 
In terms of the density, 
iy 


1 


\ 
(--2 ra a 
exp oO<zZ<+0, 
an{ 2" 1 3?) a(x 


i= 


fz) = 
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In terms of the convolution, 
F2@) = fy hi te): 
If the X; are identically distributed as X = N(u, 0”), then each X; has density 
aw)? 
I x(x) = E43... 2G? » ~O<xX< +40, 


{2x0 


and fz is the nth convolution power of fy: 


(x= np)? 
* 1 7 2 
S22) =f @)(z) = ——e 2no 5 —~0<x< +00, 
/21n oO 


Example 4.23 (1) The daily power consumption X and Y of two customers has a bi- 
variate normal distribution with parameters 


Lx = 200, Ly = 300, ox = 26, oy = 32 [in 10°kWh], and p=0.6. 
Calculate a) the probability that the daily total consumption Z = X + Y of the two cus- 
tomers is between 450 and 550, and 


b) the probability of the same event as under a), but on condition that X and Y are 
independent. 


(2) Determine the probability that the daily total consumption of 10 independent cus- 
tomers, each of them has a daily consumption of X as given under (1), is between 
1950 and 2050. 


(1) a) By (4.49), the daily total consumption of the two customers has mean value 
E(Z) = 200 + 300 = 500 
and variance/standard deviation 
Var(Z) = 0% + 2poxoy + oF = 262 +2 -0.6- 26-32 +327 = 2698.4 
so that 
Var(Z) =51.95. 
The desired probability is 


P(450 < Z< 550) = (550-500) _ p/ 450500) 


51.95 51.95 
= 0(0.92) — &(—-0.92) = 20(0.92) = 1 
= 0.664. 


b) Since X and Y are independent, p = 0. Hence, 
Var(Z) = 0% + oy = 262 +32? = 1700 and [Var(Z) = 41.23. 


Therefore, the desired probability is obtained as follows: 
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530) (30 


= (1.213) — ®(-1.213) = 20(1.213)-1 


P(450 < Z< 550) = o( 


= 0.774. 


(2) According (4.72), the daily total consumption of 10 independent customers has a 
normal distribution with parameters 


E(Z) = 10-200 = 2000, Var(Z) = 10-267 = 6760, /Var(Z) = 82.22. 
Therefore, the desired probability is 


oe o( SeCETea 


= (0.608) — &(—0.608) = 20(0.608) — 1 
= 0.456. Oo 


P(1950 < Z< 2050) = of 


Example 4.24 A bulk goods freighter has to be loaded with at least 2000 ¢ of iron ore. 
The ore arrives by goods wagons, whose load weights X), X, --- are independent 
and have an N(50, 64)- distribution. 


How many wagons are needed to make sure that the freighter can be loaded with the 
required minimum load with a probability of at least 0.99? 


Let Zn =X, + X2 + ---+Xn. n has to be determined as the smallest integer with pro- 
perty P(Zn = 2000) = 0.99. This relation is equivalent to 
P(Zn < 2000) < 0.01. (4.73) 
By (4.72), Zn = N(50n, 64n). The corresponding standardization is 
Zn —50n 
8 /n 
Hence, (4.73) can be written in the equivalent form 


2000 —50n 2000 — 50n 
P(Z 2000) = PI Y, ee = o{ 2000—50n < 0.01. 
Fa 2000) ( eS Rar 8 Jn 


Yn =N(0,1) = 


The 0.01-percentile of the standard normal distribution is -2.32, i.e., 
@(—2.32) = 0.01. 
Hence, relation (4.73) is equivalent to 
ee <-2.32 or a a 0,30) 
By squaring and some simple algebra these relations are seen to be equivalent to 
(n— 40.069)? >5.5 or n> 42.41. 
Hence, at least 43 waggons are needed. O 
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4.3.3 Sums of a Random Number of Random Variables 


Frequently, sums of a random number of random variables have to be investigated. 
For instance, the total claim size an insurance company is confronted with a year is 
the sum of a random number of random individual claim sizes. The total repair cost a 
machine causes a year is the sum of random number of random repair costs, the in- 
crease of a population a year is determined by the random number of individuals pro- 
ducing children and the random number of children produced by an individual, etc. 


Wald's Identities Let {X,, X,...} be a sequence of independent random variables, 
which are identically distributed as X with E(X) < o. Let further N be a positive, in- 
teger-valued random variable, which is independent of all X;, X2, ... Then mean value 
and variance of the sum Z = X, + X27 +---+Xy are given by Wald's identities: 


E(Z) = E(X)- E(N), (4.74) 
Var(Z) = Var(X) E(N) + [E(X)]? Var(N). (4.75) 
The proof of these relations is easily done by conditioning: 
E(Z) = Xj) E(X) +-X2 +++» +Xyl|N =n) P(N =n) 
=r E(X, +Xy +-+++ Xn) P(N =n) = D> E(nX) P(N=n) 
= E(X) Lpai n P(N =n) = E(X)- E(N). 
This proves (4.74). To verify (4.75), the second moment of Z is determined: 
E(Z?) = Dn=1 E(Z?|N =n) P(N =n) 
=D) E((X, +X0 +--+ Xn]2) P(N= 7). 
By making use of formula (2.62), page 67, 
E(Z?) = Det (Var(X, + XQ +++ + Xn) + [E(X) +X2 +--+ Xn)]7} PIN =n) 
= Dna {n Var(X) +n? [E(X)]?} PWV = n) 
= Var(X) Di n PIN =n) + [E(XOQ]2 D1 22 P(N =n) 
= Var(X) E(N) + [E(X)}° E(N). 
Hence, 
Var(Z) = E(Z*) -[E(Z)]* 
= Var(X) E(N) + [E(X)]? BN) — [BX) PLE)? 
= Var(X) E(N) + [E(X)]? Var). 
This is the identity (4.75). 


Wald's identities (4.74) and (4.75) remain valid if the assumption that N is independ- 
ent of all X; is somewhat weakened by introducing the concept of a stopping time. 
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Definition 4.2 (stopping time) A positive, integer-valued random variable N is said 
to be a stopping time for the sequence of independent random variables {X,, X9,...} 
if the occurrence of the random event 'V=n' is completely determined by the finite 
sequence X1, X9,...,Xn, and, therefore, independent of all X,,41,Xni2,.., 721. ©@ 
Note A random event A is said to be independent of a random variable _X if the indicator varia- 
ble of A is independent of X (see also example 3.14, page 146). 

Sometimes, a stopping time defined in this way is called a Markov time, and only a 


finite Markov time is called a stopping time. (A random variable Y is said to be finite 
if P(Y <0) =1. In this case, E(Y) < #.) 


The notation 'stopping time’ can be motivated as follows: The X), Xo, ... are observed 
one after the other. As soon as the event 'V=n' occurs, the observation is stopped, 
ie., the Xj41, Xn42,... will not be observed. 


Theorem 4.3 Let {X), Xo,...} be a sequence of random variables, which are identi- 
cally distributed as X with E(X) <0, and let N be a finite stopping time for this 
sequence. Then 


E(Z) = E(X) - E(N). (4.76) 


Proof Let binary random variables Y; be defined as follows: 

S| at Nee 

' (0 if N<i 

The event 'Y; = 1' occurs if and only if no stopping has been done after the observa- 

tion of the i— 1 random variables X,, X»,...,X;j_]. Since N is a stopping time, Y; is 
independent of the X;,Xj41,.... Moreover, 


E(Y;) = P(N2 i) and E(X; Y;) = E(X;) E(Y)) 


Ol (ew iets 


so that 
BO X) =EEE1 XY) 
= DE EX) EY) = B®) Ley EY) 
= EX) DE) P(N2 i). 
Now formula (2.9) at page 46 implies (4.76). 7 


Example 4.25 a) Let X; = 1 if ith flipping a fair coin yields 'head' and X; = 0 if the 
outcome is 'tail'. The X; are independent and identically distributed as 


_ | 1 if head occurs, 
~ | =1 if tail occurs. 


Then, a finite stopping time for the sequence X1, X9,... is 
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N=min {n; X,+X7+-:-+Xn = 10}. (4.77) 
Since E(X) = 1/2, 
E(X1 +Xq 4+++-+Xy) = 5+ E(N). 
According to the definition of N, 
X,+Xo4+-++:+Xy = 10 
so that E(N) = 20. 


b) Let X; = 1 if the ith flipping a fair coin yields 'head' and X; =—1 otherwise. Then 
N given by (4.77) is again a finite stopping time for X,, X2,.... A formal application 
of Wald's equation yields 


E(X, +Xo4+---+Xy) = E(X)- E(N). 


The left hand side of this equation is equal to 10. The right hand side contain the fac- 
tor E(X) = 0. Therefore, Wald's equation (4.76) is not applicable. oO 


4.4 EXERCISES 


4.1 In a game reserve, the random position (X, Y) of a leopard has a uniform distribu- 
tion in a semicircle with radius r = 10 km (figure). Determine E(X) and E(Y). 


y 10 
y 
>x 


-10 o XxX 10 


Illustration to Exercise 4.1 


4.2) From a circle with radius R = 9 and center (0,0) a point is randomly selected. 


(1) Determine the mean value of the distance of this point to the nearest point at the 
periphery of the circle. 


(2) Determine the mean value of the geometric mean of the random variables X and 


Y, ie. E( [XY ). 


4.3) X and Y are independent, exponentially with parameter 4 = 1 distributed random 
variables. Determine 


(1) E(X- ¥), 
(2) E(\X— Y|), and 
(3) distribution function and density of Z=X-— Y. 
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4.4) X and Y are independent random variables with 
E(X) = E(Y) =5, Var(X) = VarY) =9, and let U=2X+3Y and V=3X-2Y. 
Determine E(U), E(V), Var(U), Var(V), Cov(U, V), and p(U, V). 


4.5) X and Y are independent, in the interval [0,1] uniformly distributed random vari- 
ables. Determine the densities of 


(1) Z=min(X, Y), and (2) Z=XY. 


4.6) X and Y are independent and (0, 1)-distributed. Determine the density f7(z) of 
Z=XI/Y. 
Which type of probability distributions does f7(z) belong to? 


4.7) X and Y are independent and identically Cauchy distributed with parameters 
X= 1 and p=0, ie. they have densities (page 74) 


11 11 
JO) = 5 a FOUN regs, RELY 10: 


14x?’ 


Verify that the sum Z=X+Y has a Cauchy distribution as well. 


4.8) The joint density of the random vector (X, Y) is 
fy) =6x7y, OSx,y <1. 
Determine the distribution density of the product Z =X Y. 


4.9) The random vector (X, Y) has the joint density 
fy,y) =2e ©) for 0<x<y<o. 
Determine the densities of Z = max(X, Y) and Z= min(x, Y). 


4.10) The resistance values_X, Y, and Z of 3 resistors connected in series are assumed 
to be independent, normally distributed random variables with respective mean val- 
ues 200, 300, and 500 [Q], and standard deviations 5, 10, and 20 [Q]. 


(1) What is the probability that the total resistance exceeds 1020 [Q]? 


(2) Determine that interval [1000 —¢, 1000+] to which the total resistance belongs 
with probability 0.95. 


4.11) A supermarket employs 24 shopassistants. 20 of them achieve an average daily 
turnover of $ 8000, whereas 4 achieve an average daily turnover of $ 10 000. The 
corresponding standard deviations are $ 2400 and $ 3000, respectively. The daily 
turnovers of all shopassistants are independent and have a normal distribution. Let Z 
be the daily total turnover of all shop-assistants. 


(1) Determine E(Z) and Var(Z). 
(2) What is the probability that the daily total turnover Z is greater than $ 190 000? 
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4.12) A helicopter is allowed to carry at most 8 persons given that their total weight 
does not exceed 620kg. The weights of the passengers are independent, identically 
normally distributed random variables with mean value 76kg and variance 324kg?. 


(1) What are the probabilities of exceeding the permissible load with 7 and 8 passen- 
gers, respectively? 

(2) What would the maximum total permissible load have to be to ensure that with 
probability 0.99 the helicopter will be allowed to fly 8 passengers? 


4.13) Let X be the height of the woman and Y be the height of the man in married 
couples in a certain geographical region. By analyzing a sufficiently large sample, a 
statistician found that the random vector (X, Y) has a joint normal distribution with 
parameters 


E(X) = 168m, Var(X) = 64cm, E(Y) = 175 cm, Var(Y) = 100cm?, p = 0.86. 


(1) Determine the probability P(X > Y) that in married couples in this area a wife is 
taller than her spouse. 

(2) Determine the same probability on condition that there is no correlation between 
X and Y, and interprete the result in comparison to (1). 


Hint \f you do not want to use a statistical software package, make use of the fact that the de- 
sired probability has structure PY > Y) = P(X + (—Y) > 0) and apply formula (4.48), page185. 


4.14) A target, which is located at point (0,0) of the (x, y)- coordinate system, is sub- 
ject to permanent shellfire. The random coordinates X and Y of the hitting point of a 
shell are independent and identically as N(0, o7)-distributed. 


(1) Determine the distribution function F'7(z) of the random distance Z of a hitting 
shell (identified with its midpoint) to the target at (0,0). To what distribution type 
belongs F'7(z)? 

(2) Determine E(Z). 


CHAPTER 5 


Inequalities and Limit Theorems 


5.1 INEQUALITIES 


5.1.1 Inequalities for Probabilities 


Inequalities in probability theory are useful tools for estimating probabilities and mo- 
ments of random variables if their exact calculation is only possible with extremely 
high effort or is even impossible in view of incomplete information on the underlying 
probability distribution. In what follows, all occurring mean values and variances are 
assumed to be finite. 


Inequality of Chebyshev (also called Bienaymé-Chebyshev inequality) For any ran- 
dom variable Y with mean value p = E(X), variance o” = Var(X), and for any ¢ > 0, 


o2 
PUL Spee) s—5% (5.1) 
€ 
To prove (5.1), assume for convenience that X has density f(x). Then, 
of =p) f@dr= fe w)*f@)dx 
(x, fy-w[2e} 


> fe? fQx)dx = &?P(\X-p| 28). 
(x, »-p[2e} 
This proves the two-sided Chebyshev inequality (5.1). The following one-sided Che- 
byshev inequality is proved analogously: 
2 
P(X¥-p2e)<—2 


o2 +62 


Corollary By letting ¢ = no, one gets from formula (5.1) no-rules: 
P(|X— | = no) < 1/n? or P(\X-p| <no) > 1-1/n?. (5.2) 


Example 5.1 The height X of trees in a forest stand has mean value p= 20m and 
standard deviation o = 2m. To obtain an upper limit of the probability that the height 
of a tree differs at least 4m from tt, Chebyshev's inequality (5.1) is applied: 


P(|X—20| = 4) < 4/16 = 0.250. 
For the sake of comparison, assume that the height of trees in this forest stand has a 


normal distribution. Then the exact probability that the height of a tree differs at least 
4m from wp is 
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P(|X-20| > 4) = P(X- 20 > 4) + P(X— 20 < -4) =2 &(-2) = 0.046. 


In this case Chebyshev's inequality gives a rather rough upper bound. On the other 
hand, this inequality requires little input. O 


Example 5.2 Let X1,X2,...,.Xn be the outcomes of m Bernoulli trials (pages 49, 51), 
with p = 1/6, ie. 

7 1 with probability 1/6, 
‘~~ |0 with probability 5/6, 
X can be interpreted as the number of the occurrences of "6" when tossing a fair die 
times. By making use of the Chebyshev inequality, the smallest integer n =ng with 
property 


and X= par Xj. 


p( 4-4) > 0.01) £0.05 forall >ng 


has to be found. Note that X/n is the relative frequency of the occurrence of "6" 
when tossing the die n times. Since X has a binomial distribution with 


p= E(X) =np =n/6 and Var(X) = np(1 — p) = 50/36. 


X/n has mean 1/6 and variance o? = Var(X/n) = —Var(X) = a This implies 
Xb 5 
p( <1] > 0.01) Si earaeg < 00° 
Hence, 2 <n sothat np = 27778. oO 
(0.01)2-36-0.05 


Inequalities of Gauss Let _X be a continuous random variable with u = E(X) and uni- 
modal density with mode xm. Then the Gauss inequalities are 


2 = 2 
Peenisejet 2 ee sy (5.3) 
u 


2? 
9 (¢- |w-xml) 
P(\X-xml 26) < 52 [o2+(u—xm)2], €>0. (5.4) 
€ 


(5.3) is also called Camp-Meidell inequality. 


For =m, in particular for symmetric densities with symmetry center 1, the inequal- 
ities (5.3) and (5.4) are identical. In this case one obtains an improvement of the Che- 
byshev inequality (but under the additional assumptions of the Gauss inequalities): 


P(|X—p| > €) < (20/3). (5.5) 


Corollary By letting ¢=no and assuming unimodality with u=xm, one gets from 
formula (5.3) or (5.4) no-rules: 


P(\X- py] 2 no) <5 or P(X-p| <no)> 1-45; n=1,2,.. 6.6) 
9n 9n 
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Table 5.1 compares the lower bounds for the probabilities P(|X — | < no), which are 
given by the no-rules (5.2) and (5.6), respectively, with the exact probabilities of the 
events '|X— | <no',n = 1,2,...,5, if Xhas a normal distribution N(u, 67). 


P(|\X- p| Sno) n=1 n=2 n=3 n=4 n=5 

Chebyshev inequality >0 > 0.750 | >0.889 | >0.938 | > 0.960 
Gauss inequality > 0.556 | >0.889 | >0.951 | >0.972 | > 0.982 
Normal distribution = 0.683 | =0.955 | =0.997 | >0.999 | > 0.999 


Table 5.1 Lower bounds (5.2) and (5.6) and exact values for normal distribution 


Inequalities of Markov Type Let y = A(x) be a nonnegative, strictly increasing func- 
tion on [0,00). Then, for any ¢ > 0, the general Markov inequality is 


E(a|X1)) 


P(\|X|>8)< nS (5.7) 


(5.7) is proved as follows: 
E(h( |X) = J ACLy).£0) ay 

> [7 ays) dv + | aly) fo) dy 

>h(le fe £0) dv +All) £0) dy 


= h(e) P(|X| 2 €), 
which is equivalent to (5.7). Letting h(x) =x“, a > 0, inequality (5.7) yields Markov's 
inequality as such: 


o~ 


Pax 2) < AAD (5.8) 
From (5.8) Chebyshev's inequality is obtained - letting a =2 and replacing X with 
X-L 

If A(x) = e?*, b> 0, Markov's inequality (5.7) yields an exponential inequality: 


P(X > 8) <e PE (eM), (5.9) 


Markov's inequality (5.8) and the exponential inequality (5.9) are usually superior to 
Chebyshev's inequality, since, given X and ¢, their right-hand sides can be minimized 
with respect to a and b. On the other hand, to determine the mean values in formulas 
(5.8) and (5.9), the probability distribution of X needs to be known. But in this case 
the exact value of the desired probability P(|X| >) can be calculated anyway. Hence, 
application of (5.8) and (5.9) makes sense only if the expected values involved are 
known from whatsoever source (expert opinions) or they are estimated based on a 
sample taken from_X, 1.e., the random experiment with output XY is independently re- 
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peated n times to get a sequence of values of X: x1,x9,...,Xn. For instance, the mean 
value m= E(|X|*) occurring in (5.8) would have to be estimated by the arithmetic 
mean of the |x;|" 

A 1 n a 

m= 7 Lint |xi|*. 
If the variance o? in (5.1) is unknown, it also has to be estimated from a sample 
{x1,X2,...,Xn}. The estimator is 


2_ 1 


n ms seats © APRA 
s? = — Die @;—X)* with F= 9 Di x. 


Continuation of Example 5.1 Let us check whether the upper bound of Chebyshev's 
inequality (5.1) can by improved by (5.8) if X has a normal distribution with mean pu 
and standard deviation o = 2. 


For a = 1, the mean value E(|X— |“) becomes (see page 79), 


B(X- pl) = [2 0 ~ 0.798 -2 = 1.596. 


Hence, (5.8) yields 


P(X p] > 4) < Sed ee wl) _ ee 


= 0.399. 
This is a worse result than the one given a ner s inequality (a = 2). 
Now let a = 4. Then (see page 83, note that X— u has mean value 0) 
4 
B(X—p)*) = 14 = B(X— p)4) = 304. 
Hence, (5.8) yields 


E(X-p\*) _ 3.24 4g 
P(X—p| 24) s EO = 2 = SE = 0.1875. 


This is a substantial improvement of the bound given by Chebyshev's inequality. O 


5.1.2 Inequalities for Moments 


Inequalities of Chebyshev Let functions g(x) and A(x) be either both nonincreasing 
or both nondecreasing. Then, 


E[g(X)] E[A(X)] < E[g(X) ACX)]. (5.10) 
If g is nonincreasing and / nondecreasing or vice versa, then 
E[g(X)] E[h(X)] 2 E[g(X) A(X). 
As an important special case, let 
g(x) =x" and h(x) =x; r,s 20. 
Then, from (5.10), 
E(IX" |) E(X*|) < E((X"]). 
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Inequality of Schwarz 
[A(X YD)? < B(x?) BUY |*). (5.11) 
H6lder's Inequality Let 7 and s be positive numbers satisfying 1 + i =1. Then 
EUXY|) < (BUX) EY). (5.12) 
For r=s=2, Hélder's inequality implies the inequality of Schwarz. 
Inequality of Minkovski (Triangle Inequality) For r= 1, 
[B(x + ¥|""" < CBC)" + BY”. (5.13) 


Inequality of Jensen Let A(x) be a convex (concave) function. Then, for any_X, 
< 
h(E(X)) (>) E(h(X)). (5.14) 


In particular, if X is nonnegative and h(x) =x% (convex for a>1 and a<0, concave 
for0<a<1), A(x) =e* (convex), and A(x) =1Inx (concave), the respective inequal- 
ities of Jensen are 


[E(X)]* < E(X*%): ~fora>1 ora<0, 
[E(X)]¢ = EX): for 0<a<1, 


eB) < B(e*), (5.15) 
In E(X) > E(in.X). 


Example 5.3 To get an impression on the sharpness of the inequalities of Schwarz 
and Minkowski, let us consider a random vector (X, Y) with joint density 


fxy@y)=xty, OSx,y<1, 
and marginal densities (see example 3.5, page 129) 
Sx@)=x4+1/2, fyo)=yt1/2; O<xy<l. 
Schwarz inequality: The second moment of X is 
E(X?) = [x2 + 1/2) dx = 5/12. 
For symmetry reasons, E(Y*) = 5/12 as well. Thus, (5.11) yields 
[E(XY)]? < 0.174 


so that the upper bound for E(XY) is 0.417. For the sake of comparison, the exact 
value of E(XY) is 


E(XY) =f, i xy(x+y)dxdy=2 i Jox2y dx dy = 0.333. 
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Minkovsky inequality: For r = 1, inequality (5.13) is trivial (left- and right-hand side 
are equal). Let r = 2. Then (5.13) becomes 


JE(X+Y)2 < JE(X2) + JE(Y)?. 
Since E(X?) = E(Y*) = 5/12, an upper bound for yE(X+ Y)? is 1.291: 


fE(X+Y¥)2 < 1.291. 
For the sake of comparison: 


lel 
E(X+Y)? = i ler +2xy+y?)(x+y) dx dy 


= I ihe (x3 + 3x2y+ 3xy2 +y?)dxdy 


St Las gi ooe 3) slyg Agel es 
= [5 (Gave by a ib adr a a RY awe 


Hence, J E(X+ Y)? = 1.225. Oo 


5.2 LIMIT THEOREMS 


5.2.1 Convergence Criteria for Sequences of Random Variables 


There are three large classes of limit theorems in probability theory: 1) The laws of 
the large numbers, 2) the central limit theorem and its numerous modifications, and 
3) the local limit theorems. The laws of the large numbers are essentially statements 
on the convergence behavior of arithmetic means of random variables. They constit- 
ute the theoretical foundation of statistical methods for the estimation of parameters 
of probability distributions based on samples. They also have applications in simula- 
tion procedures for the numerical solutions of stochastic and even deterministic prob- 
lems. The central limit theorem justifies the application of the normal distribution as 
distribution of random variables, which are known to arise by the additive superposi- 
tion of numerous random influences. Local limit theorems investigate the conver- 
gence of probability densities of continuous random variables and the convergence 
of the probabilities P(X = x;) of discrete random variables X. 


Limit theorems in probability theory are subject to certain convergence criteria for 
sequences of random variables, which next have to be introduced (even if in a more 
or less heuristic way). 


1) Convergence in Probability A sequence of random variables {X1,X,...} | con- 
verges in probability towards a random variable X if for all ¢ > 0, 


lim P(|X;-X|> 8) =0. (5.16) 
1-00 
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2) Convergence in Mean A sequence of random variables {.X1,X2,...} with 
property 
EX) <003. 72:1, 2:0. 
converges in mean towards a random variable X if 
Jim E(|X; —X]) = 0 and E(|X|) <. (5.17) 


3) Mean Square Convergence A sequence of random variables {.X 1, X>,...} with 
E(\X;|7) <0; i=1,2,.., 
converges im mean square or in square mean towards a random variable X if 


Jim E(x) - X17) =0 and E(|X12) <0. (5.18) 


4) Convergence with Probability 1 A sequence of random variables {.X1,X9,...} 
converges with probability | or almost sure towards a random variable X if 


P( lim X; =X) = 1. 
1>00 


5) Convergence in Distribution Let the random variables X; have the distribution 
functions F'y,(x); i=1,2,.... Then {X1,X2,...} converges towards a random varia- 
ble X with distribution function F'y(x) in distribution if, for all points of continuity x 
of F'y(x), 

lim Fy, (x) = lim P(X; <x) = P(X <x) = Fy(x). 

1>00 1>00 


2 
a i) |}_ ses 


4 


Figure 5.1 Relations between the convergence criteria 1-5. 


Figure 5.1 shows the implications between the convergence critria. The integers refer 
to the respective convergence criteria listed above. 


Under additional assumptions, the opposite implications may be true as well (in what 
follows, the notation X, — X refers to the convergence criterion k above): 


5 @ : : 1 : , 
1) If Xn > c is true with a finite constant c, then X; > c, i.e., in case of a constant 
limit, convergence in probability and convergence in distribution are equivalent. 


eer : : 
2) If Xn > X is true, then there exists a subsequence {Xn,, Xn3,...} of the given se- 


4 
quence {X1, X,...} so that Xn, > X for i— o. 
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5.2.2 Laws of Large Numbers 


5.2.2.1 Weak Laws of Large Numbers 


There are weak and strong laws of large numbers. They essentially deal with the con- 
vergence behavior of arithmetic means X, for n > 0, where 


Xn= 4 DE Xi. 


Definition 5.1 A sequence of random variables {X1,X2,...} satisfies the weak law of 
large numbers if there exists a sequence of real numbers {a 1,q@2,...} so that the 


sequence {X} — a , X7 —ap,...} converges in probability towards 0. e 


A direct consequence of the Chebyshev's inequality (5.1) is the following version of 
the weak law of large numbers. 


Theorem 5.1 Let {.X],X>,...} be a sequence of independent, identically distributed 
random variables with finite mean 1 and variance o*. Then the sequence of arithmetic 
means {X 1, X2,...} converges in probability towards i, i.e., for all ¢ > 0, 


Jim, PY, | >) =0, 
Proof In view of Var(X;) = 02/n, Chebyshev's inequality (5.1) yields 
= 2 
Pl |Xn-pl>e)<sS. 5.19 
(l%n-n] >2) <5 (5.19) 
Letting n > © proves the theorem. a 


Bernoulli's Weak Law of the Large Numbers The first version of the weak law of 
the large numbers can be found in Bernoulli (1713), the first textbook on probability 
theory. Jacob Bernoulli considered the limit behavior of the sequence {X1,X,...}, 
where the X; are the indicator variables for the occurrence of a random event A ina 
series of n independent trials: 


1 if A occurs, 
X; = c i= 1, 25 aoe 
0 otherwise. 
The sum Zy = pai X; is the number of occurrences of the random event A in this 


series, and the arithmetic mean 
A = 1 n 
PAA)=Xn = Gy Via Xi 
is the relative frequency of the occurrence of event A in a series of n trials. From sec- 


tion 2.2.2, page 51, we know that Z, has a binomial distribution with parameters n 
and p = P(A) so that 


E(Zn)=np and Var(Zn)=np(1—p). 
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Therefore, the relative frequency py(A) has mean value 
A 1 
E(Bn(A)) = 7 Diet EX) = 7 (n P(A) = P(A) =p 
and variance 
: le 
Var (jn(A)) = 2. 
Now, applying (5.1) to the sequence { p;(A), p2(A),...} yields for all ¢ > 0, 


PC 


pn(A)—P(A)| > 2) <2 0 as n> 0. 
née 
This proves Bernoulli's weak law of the large numbers: 


The relative frequency Pn(A) of the occurrence of the random event A in a series 
of n independent trials converges to p = P(A) in probability asn>o: 


jim, pn(A) = P(A). 
Two more variants of the weak law of the large numbers will be added. 


Theorem 5.2 (Chebyshev) Let {X1,X2,...} be a sequence of (not necessarily inde- 
pendent) random variables X; with finite means u; = E(X;); i= 1,2,.... On condition 


lim Var(X;) = 0, 
i>0o 
the sequence {X] — p11, X2 —[2,...} converges in probability towards 0. a 


The following theorem does not need assumptions on variances. Instead, the pairwise 
(not the complete, page 145) independence of the sequence {.X1,X9,...} is required, 


i.e., X; and X; are independent for i #/. 


Theorem 5.3 (Chintchin) Let {.X1,X2,...} be a sequence of pairwise independent, 
identically distributed random variables with finite mean pp. Then the corresponding 
sequence of arithmetic means {Xx 1,2, a converges in probability towards pL. = 


5.2.2.2 Strong Laws of Large Numbers 


These laws of the large numbers are called strong, since the almost sure convergence 
implies the convergence in probability (Figure 5.1). Thus, almost sure convergence is 
a stronger property than convergence in probability. 


Definition 5.2 A sequence of random variables {.X),X>,...} satisfies the strong law 
of the large numbers if there is a sequence of real numbers { a1,q2,...} so that the 
sequence {X, —a,, X7 —ap,...} converges with probability 1 towards 0: 


P( tim (X;—a;) = 0) = 1. e 
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If a sequence of random variables satisfies the strong law of the large numbers with a 
sequence of real numbers { a1, q@2,...}, then it satisfies the weak law of the large num- 
bers with the same sequence of real numbers. The converse is generally not true. Here 
two versions of the strong law of the large numbers are given. 


Theorem 5.4 (Kolmogorov) Let {X,,X2,...} be a sequence of independent, identic- 


ally distributed random variables with finite mean 1. Then the sequence of arithmetic 
means {xX 1X2, a converges with probability 1 towards pL. a 


Theorems 5.4 implies that the sequence of relative frequencies { p(A), P2(A), ...} 
also converges towards p=P(A) with probability 1. Thus, Bernoulli's law of the 


large numbers is both weak and strong. The following theorem abandons the 
assumption of identically distributed random variables. 


Theorem 5.5 (Kolmogorov) Let {X1,X,...} be a sequence of independent random 
variables with parameters u; = E(X;) and o? = Var(X;); i= 1,2,... On condition 
Liei(G;/i)* <x, 
the sequence {Y), Y2,...} with 
Yn =Xn-% Diet bi 
converges with probability 1 towards 0. a 


5.2.3 Central Limit Theorem 


The central limit theorem provides theoretical reasons for the significant role of the 
normal distribution in probability theory and its applications. Intuitively, it states that 
arandom variable, which arises from additive superposition of many random influenc- 
es with none of them being dominant, has approximately a normal distribution. The 
simplest version of the central limit theorem is the following one: 


Theorem 5.6 (Lindeberg and Lévy) Let Z,=X,+X2+---+Xn be the sum of n 
independent, identically distributed random variables X; with finite mean E(X;) =u 
and finite variance Var(X;) = 07, and let S, be the standardization of Zp, i.e. 


Zn —nUW 
Sn= . 
"on 
x 
: Sal —w7/2 4 
Then, jim P(Sn <x)= Tz ak e du = P(x), 


where ®(x) is the distribution function of the standard normal distribution (0,1). 
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Corollary Under the conditions of theorem 5.6, Z, has for sufficiently large n 
appro- ximately a normal distribution with mean value np and variance no? : 


Zn = Nnp,no”). (5.20) 
Thus, Zy is asymptotically normally distributed as n — «. The fact that Z, has mean 
value nu and variance no” follows from (4.57), page 188. 


As a tule of thumb, (5.20) gives satisfactory results ifn => 20. Sometimes even n = 10 
is sufficient. The following theorem shows that the assumptions of theorem 5.6 can 
be partially weakened. 


Theorem 5.7 (Lindeberg and Feller) Let Zy =X ,+X2+---+Xn be the sum of in- 
dependent random variables X; with densities fy,(x), finite means py; = E(X;), and 
finite variances oF = Var(X;). Let further S, be the standardization of Zn : 


Zn-E(Zn) _ Zn-Dit Bi 
ay — , 


Then the limit relation 


x 
: 252: 
Jin, P(Sn <x) = (9) = S— Je wy (5.21) 


jn = 


is uniformly true for all x and Var(Z;,) has the properties 


Jim, JVar(Zn) > %© and lim, max ls] >0 (5.22) 
if and only if the oe condition 


x j (x 1) fi e)de = 0 
el te |x-p1;|>e,/ Var(Zn) } 


is fulfilled for all ¢ > 0. a 


lim 
n> Te n) 


The properties (5.22) imply that no term X; in the sum dominates the rest and that 
for n+ co the contributions of the X; to the sum uniformly tend to 0. Under the 
assumptions of theorem 5.6, the X; a priori have this property. 


Example 5.4 Weekdays a car dealer sells on average one car (of a certain make) per 
t= 2.4 days with a standard deviation of o = 1.6. 

1) What is the probability that the dealer sells at least 35 cars a quarter (75 weekdays)? 
Let X;;i=1,2,...,Xq =0 be the time span between selling the (i— 1) th and the ith 
car. Then Z, =X, +X2+---+Xy _ is the time point, at which the mth car is sold (sel- 
ling times assumed to be negligibly small). Hence, the probability P(Z35 < 75) has to 
be determined. 
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If the X; are assumed to be independent, then 
E(Z35) = 35+ 2.4 = 84 and Var(Z35) =35- 1.67 = 89.6. 
In view of (5.20), Z35 has approximately an N(84, 89.6) -distribution. Hence, 


P(Z35 $75) * o( BS) = ®(-0.95) = 0.171. 


2) How many cars pin the dealer does have to stock at least at the beginning of a 


quarter to make sure that every customer can immediately buy a car with a probabili- 
ty of not smaller than 0.95? 


N=MNypin 1S the smallest n with property that 
P(Zn41 > 75) 20.95. 
Equivalently, “min is the smallest n with property 


75-2.4 (n+1) 
1.6 /n+l 


Since the 0.05-percentile of an M(0, 1)-distribution is xg 95 = —1.64, the latter inequal- 
ity is equivalent to 


75 —2.4(n +1) 


16 /n+1 


Hence, "min = 37. O 


P(Znz1 <75) < 0.05 or of } < 0.05. 


<-1.64 or (n—30.85)? > 37.7. 


Normal Approximation to the Binomial Distribution Any binomially with param- 
eters n and p distributed random variable Z, can be represented as the sum of n in- 
dependent (0,1)-random variables of structure 


1 with probability p, 


we 
‘lO with probability 1 —p’ 


<p<il. 


Thus, Z, =X, +X2+---+Xn so that the assumptions of central limit theorem 5.6 are 
fulfilled with p =p and o? =np(1-p): 

E(Zn)=np, Var(Zn)=np(1—-p). (5.23) 
A corollary of theorem 5.6 is 


Theorem 5.8 (Central limit theorem of Moivre-Laplace) If the random variable X 
has a binomial distribution with parameters n and p, then, for all x, 


= x 
lim A SnD <1 al J e@Pdu. a 
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0.1 - 
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Figure 5.2 Approximation of the normal distribution to the binomial distribution 


As a special case of formula (5.20), Zn has approximately a normal distribution: 


Zn = N(np, np(1—p)). 
Thus, 


. 1 . 1 
in+5—mp i}—>—np Soa 
Pi, < Zn Sin) = B|} —— | - 0) ———_ }; 085i, Sin <n. 


7p —p) 7p —p) 


(5.24) 


»=(")pia py i+3— mp bpm | 
P(Zn =i) = p/P U-p) = @O ry ®D Tid) , OSi<n. 


The term +1/2 is called continuity correction. It improves the accuracy of the approx- 
imation, since a discrete distribution is approximated by a continuous one. Because 
the distribution function of Z, has only jumps at integers i, there is 


Fz,() =Fz,(i+ 5), 1=0,1,...50. 


The approximation formulas (5.24) are the better the larger n is and the closer p is to 
1/2. Because the normal distribution is used to approximate the distribution of a non- 
negative random variable, the condition 


E(Zn) = 3. Var(Zn) (5.25) 
should be satisfied (see page 79, there written as i = 30) to make sure the approx- 


imation yields satisfactory results. In view of (5.23), this condition is equivalent to 


L= 
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Thus, for p = 1/2, only 10 summands may be sufficient to get good approximations, 
whereas for p=0.1 the number 7 required is at least 82. In practice the following 
tules of thumb will usually do: 


E(Zn) = np > 35 and/or Var(Zn) =np(1—p) > 10. 


Continuation of Example 2.5 (page 52) From a large delivery of calculators a sam- 
ple of size n= 100 is taken. The delivery will be accepted if there are at most four 
defective calculators in the sample. The average rate of defective calculators from 
the producer is known to be 2%. 


1) What is the probability P,;,; that the delivery will be rejected (producer's risk)? 


2) What is the probability C,;., to accept the delivery although it contains 7% defec- 
tive calculators (consumer's risk)? 


1) The underlying binomial distribution has parameters n = 100 and p= 0.02 : 
pi = P(Zr00 =) = ( (0.02)! (0.98)!9, §=0,1,..., 100. 


The random number Z1o9 of defective calculators in the sample has mean value and 
standard deviation 


E(Z\99) =2 and JVar(Zi99) = {100-0.02-0.98 =1.4. 


This gives for the exact value 
Prisk = 1-Ppo-P1 —P2-P3-—p4 = 9.051 


the approximative value 


Zn-2 . 5-2-0.5 
Prisk = P(Zn > 5) ~ Pl 14 = sree 


= 1 — (1.786) = 1 - 0.962 
= 0.038. 


This approximative value is not satisfactory since p is too small. Condition (5.26) is 
far from being fulfilled. 


2) In this case, p = 0.07 so that 
E(Z199) =7 and jVar(Z199) =2.551. 


This gives for C,;,; the approximative value 


Dol Sh S 
Crisk = P(Z100 <4) = p( $7 < Ts ) = &(-1.176) 


= 0.164. 
The exact value is 0.163. 


Taking into account the continuity correction proved essential both for calculating 
the approximative values of P,;., and C,isx. oO 
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Normal Approximation to the Poisson Distribution From example 4.18 (page 180) 
or from Theorem 7.7 (page 285) we know that the sum of independent, Poisson dis- 
tributed random variables has a Poisson distribution, the parameter of which is the 
sum of the parameters of the Poisson distributions of these random variables. This 
implies that every Poisson with parameter 4 distributed random variable X can be 
represented as a sum Z, of n independent, identically Poisson with parameter /n 
distributed random variables X;: 


X=ZLy =X, +Xo4+---+Xn, n=1,2,..., (5.27) 
with 
k 
P(X; = = ae en); £=0,1,... 
and B(X;) = Var(X)) = *; 7=1,2,..50. 


Random variables X (or, equivalently, their probability distributions), which can be represent- 
ed for any integer n > 1 as the sum of n independent, identically distributed random variables, 
are called infinitely divisible. Other probability distributions, which have this property, are the 
normal, the Cauchy, and the gamma distribution. 


X as given by the sum (5.27) is Poisson distributed with parameters 
E(X)=% and Var(X) =i. 


Since the sum representation (5.27) satisfies the assumptions of the central limit the- 
orem 5.6, X has approximately the normal distribution 


X2NO A), Fy(x) © of! a 


so that, using the continuity correction 1/2 as in case of the normal approximation to 
the binomial distribution, 


Badass gta a ij-5-2 
iy <X <i) » ©] —— | -o]| —+~— ], 
(4 2) Tk Tk 


a ol itam gl toe? 
eae a 


Since the distribution of a nonnegative random variable is approximated by the nor- 
mal distribution, analogously to (5.25), the assumption 


E(X)=2>3 [Var(X) =37r 


has to be made. Hence, the normal approximation to the Poisson distribution can only 
be expected to yield good results if A > 9. 


(5.28) 
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Continuation of Example 2.8 (page 56). Let _X be the random number of staff of a 
company being on sick leave a day. Long-term observations have shown that X has a 
Poisson distribution with parameter 4 = E(X) = 10. 


What is the probability that the number of staff being on sick leave a day is 9, 10, or 
11? The normal approximation to this probability is 


' ' 11+5-10 9-5-10 
P(9<X< 11) x o| —— — | -o| —2. — 
J10 J10 


= (0.474) — @(-474) = 2 0(0.474) - 1 
= 0.364. 


This value almost coincides with the exact one, which is 0.3639. Again, making use 
of the continuity correction is crucial for obtaining a good result. The approximation 
for pj, for instance, is 


1 1 
10!° 19 oe) (2) 
Sa =@ -—®O = 20(0.158)-1 
P10 = ton & (“ 1 (0.158) 
=0.1255. 


The exact value is 0.1251. Oo 


5.2.3 Local Limit Theorems 


The central limit theorems investigate the convergence of distribution functions of 
sums of random variables towards a limit distribution function. The Jocal limit theo- 
rems consider the convergence of probabilities P(Z =x;) towards a limit probability 
if Z is the sum of discrete random variables, or they deal with the convergence behav- 
ior of the densities of sums of continuous random variables. This section presents 
three theorems of this type without proof. 


Theorem 5.9 (Local limit theorem of Moivre-Laplace) The random variable X have 
a binomial distribution with parameters n and p: 


PX =i) = blimp) =(")p'd py §= 0,1, 


Then, 


2 
: : 1 1 i—np 
lim 4 /mp(1 —p) b(i;n,p) exp =0. 
Oe v2n 2 [np( -p) 


The convergence is uniform with regard to i= 0, 1,...,n. a 
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Theorem 5.9 implies that for sufficiently large n an acceptable approximation for the 
probability b(7;n,p) is 


2 
: 1 1 inp 
B(isn,p) © 5.29 
ne J2n ,/np( -p) oe (Zs | One 


Theorem 5.10 (Poisson approximation to the binomial distribution) If the parame- 
ters n and p of the binomial distribution tend to oo and 0, respectively, in such a way 
that their product np stays constant A, A > 0, then 
eee eee ae ee ae ae 
im, G37.) =a e“; 1=0,1,.... 
po 
np=Ar 


Proof From the definition of the binomial coefficient (“) (see formula (1.5)), 
bli;n,p) _n-1+1 _P np ( L)( P ) 
= : = a ee : 5.30 
b(i-1;n,p) i l-p id-p) i7\1-P C29 
After having taken the limit, the b(/;7,p) can no longer depend on n and p, but are 
only functions of i and 1, which are denoted as h(i, A). From (5.30), 


b(i;n,p) AGA) _ A 


li = SH 1S, 
wi, Gacy iG a oe? 
po 

np=Ar 


Therefore, the limit probabilities of the binomial distribution satisfy 
Ali, d) = * n(i- 1,2) rs one 
For i= 1 andi=2, this functional equation becomes 
A(1,A)=AA(0,A) and A(2,A) = % A(1,A) - ie h(0, A). 
Induction yields 
h(i, ) = XH, A). 
The normalizing condition (2.6) at page 43 gives the still unknown constant h(0,A): 
Deg Ai) = h(0, 2) X29 a =h(0,r)e* =1 
so that h(0,2) = e~*. This completes the proof of the theorem: 


N(i,d) = Be; i=0,1,... 7 


Note: The result of this theorem is formula (2.40) at page 57. 
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Example 5.5 Let X have a binomial distribution with parameters n = 12 and p = 0.4. 
For the exact probability 


pa = (12) (0.4)4 (0.68 = 0.2128 


the local limit theorem (5.29) yields the appoximative value 


1 1 ( 4-12x0.4 } 
4 — exp | -= | | sf = 0.2104, 
4 J2n J12x0.4x 0.6 | 2 \ [12x0.4x0.6 | 


whereas the central limit theorem (5.24) provides the approximative value 


4+4-12x0.4 4—-3-12x0.4 
paz® ® 
V 12x0.4x0.6 J 12x0.4x0.6 
= D(—0.17680 — B(—0.7660) = 0.2149. 
The Poisson approximation with np = 4.8 gives the worst result: 
4 
par ae © = 0.1820. O 


To formulate the next local limit theorem for sums of discrete random variables, the 
following definition is needed: 


Definition 5.3 A discrete random variable X, which for given real numbers a and b 
with b>0, can only take on values of the form 

xp=atkb; k=0,+1,+2,..., (5.31) 
is called Jattice distributed. The corresponding probability distribution of X is called 
a lattice distribution. The largest constant b, which allows the representation of all 
realizations of X by (5.31), is called the /attice constant of X or its probability distri- 
bution. Specifically, a lattice distribution with a = 0 is an arithmetic distribution. @ 


Lattice distributed random variables obviously include all integer-valued random var- 
iables as geometrically, binomially, and Poisson distributed random variables. 


Theorem 5.11 (Gnedenko) Let {X1,X>,...} be a sequence of independent, identi- 
cally lattice distributed random variables with values (5.31), finite mean value p, 
finite, positive variance o?, and 


Pn(m) = P(X, +Xo4+---+Xn =nat+mb); m=0,+1,+2,.... 


Then the following limit relation is true uniformly in m if and only if 5 is the lattice 
constant of the X1,X9,...: 


Pe Lae Ae ee, _ 
J, | pao Zon -4{ on | }-° " 
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Finally, a local limit theorem is given which deals with the convergence of the densi- 
ty of sums of random variables. 


Theorm 5.12 (Gnedenko) Let {X1,X,...} be a sequence of independent, identically 
distributed, continuous random variables with bounded density, mean value ut = 0, and 
positive, finite variance o2. If f(x) denotes the density of 


1 n 
ad papi ee 
then fn(x) converges uniformly in x to the density of the standard normal distribution: 
1-2 


(27 : 


fim, fn(x) = Px) = —o<x<+00, a 


5.3 EXERCISES 


5.1) On average, 6% of the citizens of a large town suffer from severe hypertension. 
Let X be the number of people in a sample of 7 randomly selected citizens from this 
town which suffer from this disease. 
(1) By making use of Chebyshev's inequality find the smallest positive integer 1 pin 
with property 

P( 


7X—0.06| 20.01) <0.05 forall n with n> Min. 


(2) Find a positive integer ny;, Satisfying this relationship by using theorem 5.6. 


5.2) The measurement error X of a measuring device has mean value E(X) =0 and 
variance Var(X) = 0.16. The random outcomes of n independent measurements are 
X1,X2,...,Xn, 1e., the X; are independent, identically as X distributed random variab- 
les. 

(1) By the Chebyshev's inequality, determine the smallest integer 1 =n,,;, with pro- 
perty that the arithmetic mean of n measurements differs from E(X) =0 by less than 
0.1 with a probability of at least 0.99. 

(2) On the additional assumption that XY is continuous with unimodal density and 
mode x = 0, solve (1) by applying the Gauss inequality (5.4). 

(3) Solve (1) on condition that X = N(0, 0.16). 


5.3) A manufacturer of TV sets knows from past experience that 4% of his products 
do not pass the final quality check. 

(1) What is the probability that in the total monthly production of 2000 sets between 
60 and 100 sets do not pass the final quality check? 

(2) How many sets have at least to be produced a month to make sure that at least 
2000 sets pass the final quality check with probability 0.9? 
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5.4) The daily demand for a certain medication in a country is given by a random var- 
iable X with mean value 28 packets per day and with a variance of 64. The daily de- 
mands are independent of each other and distributed as_X. 

(1) What amount of packets should be ordered for a year with 365 days so that the 
total annual demand does not exceed the supply with probability 0.99? 


(2) Let X; be the demand at day i= 1,2,..., and 
= _1 n 
Xn = eee 


Determine the smallest integer n =7,yj, So that the probability of the occurrence of 
the event 


[Xn —28| > 0.02 


does not exceed 0.05. 


5.5) According to the order, the rated nominal capacitance of condensers in a large 
delivery should be 300 uF. Their actual rated nominal capacitances are, however, 
random variables X with 


E(X) = 300 and Var(X) = 144. 


(1) By means of Chebyshev's inequality determine an upper bound for the probability 
of the event A that X does not differ from the rated nominal capacitance by more than 
5%. 


(2) Under the additional assumption that _X is a continuous random variable with uni- 
modal density and mode x = 300, solve (1) by means of the Gauss inequality (5.4). 


(3) Determine the exact probability on condition that 
X= N(300, 144). 


(4) A delivery contains 600 condensers. Their capacitances are independent and iden- 
tically distributed as XY. The distribution of X has the same properties as stated under 
(2). By means of the Gauss inequality (5.4) give a lower bound for the probability 
that the arithmetic mean of the capacitances of the condensers in the delivery differs 
from E(X) = 300 by less than 0.01. 


5.6) A digital transmission channel distorts on average | out of 10 000 bits during 
transmission. The bits are transmitted independently of each other. 


(1) Give the exact formula for the probability of the random event A that amongst 10° 
sent bits there are at least 80 bits distorted. 


(2) Determine the probability of A by approximation of the normal distribution to the 
binomial distribution. 


5.7) Solve the problem of example 2.4 (page 51) by making use of the normal approx- 
imation to the binomial distribution and compare with the exact result. 
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5.8) Solve the problem of example 2.6 (page 54) by making use of the normal approx- 
imation to the hypergeometric distribution and compare with the exact result. 


5.9) The random number of asbestos particles per 1mm? in the dust of an industrial 
area is Poisson distributed with parameter i = 8. 


What is the probability that in 1cm> of dust there are 
(1) at least 10 000 asbestos particles, and 
(2) between 8000 and 12 000 asbestos particles (including the bounds)? 


5.10) The number of e-mails, which daily arrive at a large company, is Poisson dis- 
tributed with parameter 


A = 22400. 
What is the probability that daily between between 22 300 and 22 500 e-mails arrive? 


5.11) In lkg of a tapping of cast iron melt there are on average 1.2 impurities. 
What is the probability that in a 1000kg tapping there are at least 1400 impurities? 
The spacial distribution of the impurities in a tapping is assumed to be Poisson. 


5.12) After six weeks, 24 seedlings, which had been planted at the same time, reach 
the random heights X1,X9,...,X24, which are independent, identically exponentially 


distributed as X with mean value pp = 32cm. 
Based on the Gauss inequalities, determine 
(1) an upper bound for the probability that the arithmetic mean 


differs from pt by more than 0.06 cm, 
(2) a lower bound for the probability that the deviation of X24 from t does not exceed 
0.06cm. 


5.13) Under otherwise the same assumptions as in exercise 5.12, only 6 seedlings had 
been planted. Determine 


(1) the exact probability that the arithmetic mean 


y 16 
X6 = 62 Xi 


exceeds p = 32cm by more than 0.06cm (Hint: Erlang distribution), 


(2) by means of the central limit theorem, determine a normal approximation to the 
probability 


P(X6 — 32 > 0.06). 


Give reasons why the approximation may not be satisfactory. 
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5.14) The continuous random variable X is uniformly distributed on [0, 2]. 
(1) Draw the graph of the function 
p(e) = P(|X- 1| 2) 

in dependence of ¢,0<e< 1. 
(2) Compare this graph with the upper bound for the probability 

P(\X-1| 28) 
given by the Chebyshev inequality, 0<«< 1. 
(3) Try to improve the Chebyshev upper bound for 

P(|\X-1| 28) 
by the Markov upper bound (5.8) for a=3 and a=4. 


PART II 


Stochastic Processes 


CHAPTER 6 


Basics of Stochastic Processes 


6.1 MOTIVATION AND TERMINOLOGY 


A random variable X is the outcome of a random experiment under fixed conditions. 
A change of these conditions will influence the outcome of the experiment, i.e. the 
probability distribution of X will change. Varying conditions can be taken into ac- 
count by considering random variables which depend on a deterministic parameter f: 
X = X(t). This approach leads to more general random experiments than the ones de- 
fined in section 1.1. To illustrate such generalized random experiments, two simple 
examples will be considered. 


Example 6.1 a) At a fixed geographical point, the temperature is measured every day 
at 12:00. Let x; be the temperature measured on the ith day of a year. The value of 
x; will vary from year to year and, hence, it can be considered a realization of a ran- 
dom variable X;. Thus, X; is the (random) temperature measured on the ith day of 
a year at 12:00. Apart from random fluctuations of the temperature, the X; also de- 
pend on a deterministic parameter, namely on the time, or, more precisely, on the day 
of the year. However, if one is only interested in the temperatures X1, X2, X3 on the 
first 3 days (or any other 3 consecutive days) of the year, then these temperatures are 
at least approximately identically distributed. Nevertheless, indexing the daily tem- 
peratures is necessary, because modeling the obviously existing statistical dependence 
between the daily temperatures requires knowledge of the joint probability distribu- 
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tion of the random vector (X1, X27, X3). This situation and the problems connected 
with it motivate the introduction of the generalized random experiment daily meas- 
urement of the temperature at a given geographical point at 12:00 during a year. The 
random outcomes of this generalized random experiment are sequences of random 
variables {X1,Xo,...,X365} with the X; being generally neither independent nor 


identically distributed. If on the ith day temperature x; has been measured, then the 
vector (x1,X2,...,X%365) can be interpreted as a function x = x(t), defined at discrete 
time points ¢, ¢ € [1,2,...,365]: x =x; for t=i. Vector (x1,X2,...,.%365) 1s a real- 
ization of the random vector (X1,X9, ...,X 365). 


b) If a sensor graphically records the temperature over the year, then the outcome of 
the measurement is a continuous function of time t x =x(A, O<t<1, where x(f) is 
realization of the random temperature X(f) at time ¢ at a fixed geographical location. 
Hence it makes sense to introduce the generalized random experiment continuous 
measurement of the temperature during a year at a given geographical location. It 
will be denoted as {X(A), O<t< 1}. 

A complete probabilistic characterization of this generalized random experiment re- 
quires knowledge of the joint probability distributions of all possible random vectors 


(X(t1), X(t), X(tn)), OS t1 <to << ty $1; 0 =1,2,... 


This knowledge allows for statistically modelling the dependence between the X(t;) 
in any sequence of random variables X(t,), X(t2),...,X(tn). It is quite obvious that 
for small time differences ¢;,; —t; there is a strong statistical dependence between 
X(t;) and X(t;,;). But there may also be a dependence between X(t;) and X(¢,) for 
large time differences ¢, —t; due to the inertia of weather patterns over anarea. O 


Example 6.2 The deterministic parameter, which influences the outcome of a random 
experiment, needs not be time. For instance, if at a fixed time point and a fixed obser- 
vation point the temperature is measured along a vertical of length L to the earth's 
surface, then one obtains a function x = x(A), 0<h <L, which obviously depends on 
the distance h of the measurement point to the earth's surface. But if the experiment 
is repeated in the following years under the same conditions (same time, location, 
and measurement procedure), then, in view of the occurrence of nonpredictable 
influences, different functions x = x(A) will be obtained. Hence, the temperature at 
distance / is a random variable X(h) and the generalized random experiment measur- 
ing the temperature along a vertical of length L, denoted as {X(h), 0<h<L}, has 
outcomes, which are real functions of h: x =x(h), 0 <h<L. 


In this situation, it also makes sense to consider the temperature in dependence of both 
h and the time point of observation t: x =x(h,t); O<h<L, t20. Then the observa- 
tion x depends on a vector of deterministic parameters: 


x=x(0), 0=(h,/). 
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In this case, the outcomes of the corresponding generalized random experiment are 
surfaces in the (A,t,x)-space. However, this book only considers one-dimensional 
parameter spaces. 


A 
x(d) 
5.05 - 


0 1 p) 3 4 5 6 7 8 9 10 


Figure 6.1 Random variation of the diameter of a nylon rope 


An already 'classical' example for illustrating the fact that the parameter need not be 
time is due to Cramer, Leadbetter (1967): A machine is supposed to continuously 
produce ropes of length 10m with a given nominal diameter of 5mm. Despite main- 
taining constant production conditions, minor variations of the rope diameter can 
technologically not be avoided. Thus, when measuring the actual diameter x of a sin- 
gle rope at a distance d from the origin, one gets a function x = x(d) with 0 <d< 10. 
This function will randomly vary from rope to rope. This suggests the introduction of 
the generalized random experiment continuous measurement of the rope diameter in 
dependence on the distance d from the origin. If X(d) denotes the diameter of a ran- 
domly selected rope at a distance d from the origin, then it makes sense to introduce 
the corresponding generalized random experiment 


{X(d), O0< d< 10} 
with outcomes x =x(d), 0 <d< 10 (Figure 6.1). oO 


In contrast to the random experiments considered in chapter 1, the outcomes of which 
are real numbers, the outcomes of the generalized random experiments, dealt with in 
examples 2.1 and 2.2, are real functions. Hence, in the literature such generalized 
random experiments are frequently called random functions. However, the terminol- 
ogy stochastic processes is more common and will be used throughout the book. In 
order to characterize the concept of a stochastic process more precisely, further nota- 
tion is required: Let the random variable of interest X depend on a parameter ¢, which 
assumes values from a set T: X= X(), t ¢ T. To simplify the terminology and in 
view of the overwhelming majority of applications, in this book the parameter f¢ is 
interpreted as time. Thus, X(¢) is the random variable X at time ¢ and T denotes the 
whole observation time span. Further, let Z denote the set of all values the random 
variables X(t) can assume for all ¢ € T. 
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Stochastic Process A family of random variables {X(0), t ¢ T} is called a stochastic 
process with parameter space T and state space Z. 


If T is a finite or countably infinite set, then {X(0, t € T} is called a stochastic pro- 
cess in discrete time or a discrete-time stochastic process. Such processes can be 
written as a sequences of random variables {X 1, X2,...} (example 6.1 a). On the 
other hand, every sequence of random variables can be thought of a stochastic process 
in discrete time. If T is an interval, then {X(f), t € T} is a stochastic process in contin- 
uous time or a continuous-time stochastic process. A stochastic process {X(d), t € T} 
is said to be discrete if its state space Z is a finite or a countably infinite set, and a sto- 
chastic process {X(f), t € T} is said to be continuous if Z is an interval. Thus, there 
are discrete stochastic processes in discrete time, discrete stochastic processes in con- 
tinuous time, continuous stochastic processes in discrete time, and continuous stoch- 
astic processes in continuous time. Throughout this book the state space Z is usually 
assumed to be a subset of the real axis. 


If the stochastic process {X(f), t €¢ T} is observed over the whole time period T, i.e. 
the values of X(d) are registered for all t ¢ T, then one obtains a real function 
x=x(t), te T. 

Such a function is called a sample path, a trajectory, or a realization of the stochastic 
process. In this book the concept sample path is used. The sample paths of a stochas- 
tic process in discrete time are, therefore, sequences of real numbers, whereas the 
sample paths of stochastic processes in continuous time can be any functions of time. 
The sample paths of a discrete stochastic process in continuous time are piecewise 
constant functions (step functions). The set of all sample paths of a stochastic process 
with parameter space T is, therefore, a subset of all functions over the domain T. 


In engineering, science, and economics there are many time-dependent random phe- 
nomena which can be modeled by stochastic processes: In an electrical circuit it is 
not possible to keep the voltage strictly constant. Random fluctuations of the voltage 
are for instance caused by thermal noise. If v(t) denotes the voltage measured at time 
point ¢, then v= v(4) is a sample path of a stochastic process { V(t), 20} where V(A) 
is the random voltage at time ¢ (Figure 6.2). Producers of radar and satellite support- 
ed communication systems have to take into account a phenomenon called fading. 


A 
Av(t) 


0 
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Figure 6.2 Voltage fluctuations caused by random noise 
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This is characterized by random fluctuations in the energy of received signals caused 
by the dispersion of radio waves as a result of inhomogeinities in the atmosphere and 
by meteorological and industrial noise. Both meteorological and industrial noise cre- 
ate electrical discharges in the atmosphere which occur at random time points with 
randomly varying intensity. 'Classic' applications of stochastic processes in economics 
are modeling the fluctuations of share prices, rendits, and prices of commodities over 
time. In operations research, stochastic processes describe the development in time 
of the 'states' of queueing, inventory, and reliability systems. In statistical quality con- 
trol, they model the fluctuation of quality criteria over time. In medicine, the develop- 
ment in time of ‘quality parameters' of health as blood pressure and cholesterol level 
as well as the spread of epidemics are typical examples of stochastic processes. 


Important impulses for the development and application of stochastic processes came 
from biology: stochastic models for population dynamics from cell to mammal level, 
competition models (predator-prey), capture-recapture models, growth processes, 
and many more. 


6.2 CHARACTERISTICS AND EXAMPLES 


From the mathematical point of view, the given heuristic explanation of a stochastic 
process needs to be supplemented. Let F(x) be the distribution function of X(A): 

F(x) = P(X(t) <x), te T. 
The family of the one-dimensional distribution functions 

{Fi(x),t € T} 

is the one- dimensional probability distribution of {X(t), t €¢ T}. In view of the statis- 
tical dependence, which generally exists between the X(t,), X(t2),.... X(tn) for any 
t1,t2,...,tn, the family of the one-dimensional distribution functions {F'(x), t € T} 
does not completely characterize a stochastic process (see examples 6.1 and 6.2). 
A stochastic process {X(4), t € T} is only then completely characterized if for all pos- 
itive integers n = 1,2,..., for all n-tuples {t1,¢9,...,tn} with ¢; € T, and for all vectors 
{X1,X2,-.X%n} With x; € Z, the joint distribution function of the random vector 
(X(t), X(t2), ---. X(tn)) 18 known: 


Ft, ty y.oostn 01X25 +9 Xn) = P(X(t1) SX1, X(t2) S x2, --) Mtn) S Xn). (6.1) 


The set of all these joint distribution functions defines the probability distribution of 
the stochastic process. For a discrete stochastic process, it is generally simpler to cha- 
racterize its probability distribution by the probabilities 


P(X(t1) € A1, Xty) € Aa, .., X(tn) € An) 


for all t),¢,...,fn witht; € T andA; CZ; i=1,2,...,n; n=1,2,.... 
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Trend Function Assuming the existence of E(X(f)) for all t e T, the trend or trend 
function of the stochastic process {X(), t ¢ T} is the mean value of X(A) as a function 
of t: 


m(t) = E(X(t)), tT. (6.2) 


Thus, the trend function of a stochastic process describes its average development of 
the process in time. If the densities f;(x) = dF;(x)/dx exist, then 


m(t)=["exfixydx, teT. 
Covariance Function The covariance function of a stochastic process { X(t), t € T} 


is the covariance between the random variables X(s) and_X(¢) as a function of s and t. 
Hence, in view of (3.37) and (3.38), page 135, 


C(s, t) = Cov (X(s), X(0) = E(LX(s) — m(s)] [X() -—m]);_ s,¢ € T, (6.3) 


or 
C(s, t) = E(X(s) XO) — m(s)m(h;_ 8, t € T. (6.4) 
In particular, 
C(t, t) = Var(X(0). (6.5) 
The covariance function is a symmetric function of s and ¢: 
C(s, t) = C(t). (6.6) 


Since the covariance function C(s, ¢) is a measure for the degree of the statistical de- 
pendence between X(s) and X(f), one expects that 
lim C(s,f) =0. (6.7) 


|t-s|—00 


Example 6.3 shows that this need not be the case. 


Correlation Function The correlation function of {X(t), t € T} is the correlation 
coefficient p(s, 2) = p(X(s), X(2)) between X(s) and X(t) as a function of s and ¢. 
According to (3.43), 

Cov (X(s), X(2)) 


| Var(X(s)_,f Var(X(b) 


The covariance function of a stochastic process is also called autocovariance func- 
tion and the correlation function autocorrelation function. This terminology avoids 
mistakes, when dealing with covariances and correlations between X(s) and Y(t) for 
different stochastic processes {X(f), t ¢ T} and { Y(t), t ¢ T}. The cross covariance 
function between these two processes is defined as 


C(s, 2) = Cov (X(s), YO) = E(LX(s) — mx(s)] [YO-myO]); s,¢ € T, (6.9) 


p(s, 0 = (6.8) 


with m y(t) = E(X(H) and my(t)=E(Y(d). Correspondingly, the cross correlation 
function between the processes {X(f), t ¢ T} and {Y(A), t € T} is 
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Cov (X(s), YO) 


; (6.10) 
{Var(X(s) [Var 


p(s, 1) = 


As pointed out in section 3.1.3 (page 139), the advantage of the correlation coefficient 
to the covariance is that it allows for comparing the (linear) dependencies between 
different pairs of random variables. Being able to compare the dependency between 
two stochastic processes by their cross-correlation function is important for processes, 
which are more or less obviously dependent as, for instance, the development in time 
of air temperature and air moisture or air temperature and CO» content of the air. 


Semi-variogram The semi-variogram or, shortly, variogram of a stochastic process 
{X(0), t€ T} is defined as 


65,8) =F EXO -X(9)? (6.11) 


as a function of s and ¢; s,t € T. The variogram is obviously a symmetric function in 
sand t: y(s,t) =y(t,5). 

The concept of a variogram has its origin in geostatistics for describing properties of 
random fields, 1.e., stochastic processes, which depend on a multi-dimensionally 
deterministic parameter t, which refers to a location, but may also include time. 


Example 6.3 (cosine wave with random amplitude) Let 
X(t) =A cos of, 
where 4 is a nonnegative random variable with E(A) <. The process {X(A), t= 0} 
can be interpreted as the output of an oscillator which is selected from a set of identi- 
cal ones. (Random deviations of the amplitudes from a nominal value are technolog- 
ically unavoidable.) The trend function of this process is 
m(t) = E(A) cos ot. 
By (6.4), its covariance function is 


C(s, t) = E([A cos ws][A cos wt]) — m(s)m(t) 
= [E(A2) — (E(A))?](cos ws)(cos(wf)). 
Hence, 
C(s, t) = Var(A)(cos @s)(cos of). 


Obviously, the process does not have property (6.7). Since there is a functional rela- 
tionship between X(s) and X(¢) for any s and t, X(s) and X(¢) cannot tend to become 
independent for |f—s| — «©. Actually, the correlation function p(s,t) between X(s) 
and X(f) is equal to 1 for all (s, ). oO 


The stochastic process considered in example 6.3 has a special feature: For a given 
value a that the random variable A has assumed, the process develops in a strictly 
deterministic way. That means, by only observing a sample path of such a process 
over an arbitrarily small time interval, one can predict the further development of the 
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Figure 6.3 Pulse code modulation 


sample path with absolute certainty. (The same comment refers to examples 6.6 and 
6.7.) More complicated stochastic processes arise when random influences continu- 
ously, or at least repeatedly, affect the phenomenon of interest. The following exam- 
ple belongs to this category. 


Example 6.4 (pulse code modulation) A source generates symbols 0 or | independ- 
ently with respective probabilities p and 1 - p. The symbol '0' is transmitted by send- 
ing nothing during a time interval of length one. The symbol '1' is transmitted by 
sending a pulse with constant amplitude a during a time unit of length one. The source 
has started operating in the past. A stochastic signal (sequence of symbols) generated 
in this way is represented by the stochastic process {X(f), t € (—0,+00)} with 


+00 
X= XL Anh(t-n), n<t<nt+l, (6.12) 


where the Ay; n =0,+1,+2,...; are independent binary random variables defined by 


Pe 0 with probability D, 
" a_ with probability 1 —p, 


and /(f) is given by 


1 for O0<t<l 
A(t) = ? 
C) | 0 elsewhere. 


For any f, 
0 with probability P; 


ays: e with probability 1—p. 


For example, the section of a sample path x = x(f) plotted in Figure 6.3 is generated 
by the following partial sequence of a signal: 

--LOLLOOL---. 
The role of the function h(¢) is to keep X(¢) at level 0 or 1, respectively, in the inter- 
vals [n,n +1). Note that the time point t=0 coincides with the beginning of a new 
transmission period. The process has a constant trend function: 


m(t)=a- P(X) =a)+0- P(X) =0) =a —-p). 
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Forn<s,t<n+1; n=0,+1,+2...., 
E(X(s)X()) = E(X(s)X()|X(s) = a) - P(X(s) = a) 
+ E(X(s)X(0)|X(s) = 0) - P(X(s) = 0) 
=a?(1—p). 
Therefore, 
Cov(X(s), X(t) = a2(1 —p)— a2(1—p)* =a?p(1—p) for n<s,t<n+l. 


Ifm<ss<m+landn<t<n+1withm#n, then X(s) and X(A are independent ran- 
dom variables. Hence, the covariance function of {X(4), t € (—0, +00)} is 


2 ‘ 
1- fe < 1 = 0st 1st? 2: 
C(s,1) = a‘p(i-p) for n<s,t<n+1; n=0,+1,+2, 
0 elsewhere 


Although the stochastic process analyzed in this example has a rather simple struc- 
ture, it is of considerable importance in physics, electrical engineering, and commu- 
nication; for more information, see e.g. Gardner (1989). A modification of the pulse 
code modulation process is considered in example 6.8. As the following example 
shows, the pulse code modulation is a special shot noise process. 


Example 6.5 (shot noise process) At time points 7, pulses of random intensity An 
are induced. The sequences {7 1, 7>,...} and {A 1,Ap,...} are assumed to be discrete- 


time stochastic processes with properties 
1) With probability 1, 7) < 7) <--- and jim Tn = 0, 


2) E(An) <3 n=1,2,.... 


In communication theory, the sequence {(Tn, An); n =1,2,...} is called a pulse pro- 
cess. (In section 7.1, it will be called a marked point process.) The function h(?), the 
response of a system to a pulse, has properties 


h(t) =0 for#<0 and lim h(#)=0. (6.13) 
—>00 


The stochastic process {X(f), t € (—o0,+00)} defined by 
X(t) = Dna An h(t-Tn) (6.14) 


is called a shot noise process or just shot noise. It quantifies the additive superposition 
of the responses of a system to pulses. The factors Ay are sometimes called am- 
plitudes of the shot noise process. In many applications, the Ay are independent, iden- 
tically distributed random variables, or, as in example 6.4, even constant. 


If the sequences of the 7, and A» are doubly infinite, 
{Tn; n=0,+1,+2,...} and {An; n=0,+1,+2,...}, 


then the shot noise process {X(d), tf € (—00,+00)} is defined as 
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X() = LI An h(t-Tn). (6.15) 


A well-known physical phenomenon, which can be modeled by a shot noise process, 
is the fluctuation of the anode current in vacuum tubes (tube noise). This fluctuation 
is caused by random current impulses, which are initiated by emissions of electrons 
from the anode at random time points (Schottky effect); see Schottky (1918). The 
term shot noise has its origin in the fact that the effect of firing small shot at a metal 
slab can be modeled by a stochastic process of structure (6.15). More examples of 
shot noise processes are discussed in chapter 7, where special assumptions on the 
underlying pulse process are made. Oo 


6.3 CLASSIFICATION OF STOCHASTIC PROCESSES 


Stochastic processes are classified with regard to properties which reflect, e.g., their 
dependence on time, the statistical dependence of their developments over disjoint 
time intervals, and the influence of the history or the current state of a stochastic 
process on its future evolvement. In the context of example 6.1: Has the date any 
influence on the daily temperature at 12:00? (That need not be the case if the meas- 
urement point is near to the equator.) Or, has the sample path of the temperature in 
January any influence on the temperature curve in February? For reliably predicting 
tomorrow's temperature at 12:00, is it sufficient to know the present temperature or 
would knowledge of the temperature curve during the past two days allow a more 
accurate prediction? What influence has time on trend or covariance function? 
Special importance have those stochastic processes for which the joint distribution 
functions (6.1) only depend on the distances between ¢; and f;,, i.e., only the relative 
positions of t,¢2,...,¢n to each other have an impact on the joint distribution of the 
random variables X(t1), X(¢2), ...,X(tn). 


Strong Stationarity A stochastic process {X(4, t € T} is said to be strongly station- 
ary or strictly stationary if for alln = 1,2,..., for any real t, for all n-tuples 
(t1,t2,..,tn) witht; ¢T and t;+1¢T; i=1,2,...,7; 
and for all n-tuples (x1,x2,...,%n) , the joint distribution function of the random vec- 
tor (X(¢1),X(t2), ...,X(tn)) has property 
Ft), tyyeeg tn 415X250 Xn) = Ft 4, ott ran tytt(X 1X2, +-5Xn). (6.16) 


That means, the probability distribution of a strongly stationary stochastic process is 
invariant against absolute time shifts. In particular, by letting n = 1 and t= ¢,, proper- 
ty (6.16) implies that F(x) = Fy41(x) for all t with arbitrary but fixed ¢ and x. That 
means F’;(x) actually does not depend on ¢. Hence, for strongly stationary processes 
there exists a distribution function F(x), which does not depend on ¢, so that 


F(x) = F(x) for allt ¢ T and x € Z. (6.17) 


6 BASICS OF STOCHASTIC PROCESSES 231 


Hence, trend and variance function of {X(t), t¢ €¢ T} do not depend on ¢ either: 
m(t) = E(X(t)) =m, Var(X(t)) = 0? (6.18) 


(given that the parameters m and o? exist). The trend function of a strongly station- 
ary process is, therefore, a parallel to the time axis, and the fluctuations of its sample 
paths around the trend function experience no systematic changes with increasing t. 


What influence has the strong stationarity of a stochastic process on its covariance 
function? 


To answer this question, the special values n = 2, t) =0, tp =t—s, and t=5 are sub- 
stituted in (6.16). This yields for all s < ¢, 


Fo, t-s(¥1,%2) = Fs, 4% 1,%2), 


i.e. the joint distribution function of the random vector (Xs, X7), and, therefore, the 
mean value of the product X;X;, depend only on the difference t = ¢— 5, and not on 
the absolute values of s and t. Hence, by formulas (6.4) and (6.18), C(s, f) must have 
the same property: 

C(s, t) = C(s, s+) = C(O, t) = C(t). 


Thus, the covariance function of strongly stationary processes depends only on one 
variable: 


C(t) = Cov (Xs), X(s +7)) for all s € T. (6.19) 


Since the covariance function C(s,¢) of any stochastic process is symmetric in the 
variables s and t, the covariance function of a strongly stationary process is a sym- 
metric function with symmetry center t = 0, i.e. C(t) = C(t) or, equivalently, 


C(t) = C(|tI). (6.20) 


In practical situations it is generally not possible to determine the probability distribu- 
tions of all possible random vectors {X(t1), X(t2),---,X(tn)} in order to check whether 
a stochastic process is strongly stationary or not. But the user of stochastic processes 
is frequently satisfied with the validity of properties (6.18) and (6.19). Hence, based 
on these two properties, another concept of stationarity had been introduced. It is, 
however, only defined for second-order processes: 


Second-Order Process A stochastic process {X(t), t € T} is called a second-order 
process if 


E(X?(t)) < forall te T. (6.21) 


The existence of the second moments of X(#) as required by assumption (6.21) implies 
the existence of the covariance function C(s, t) for all s and ¢, and, therefore, the exist- 
ence of the variances Var(X(t)) and mean values E(X(f)) for all t ¢ T (see inequality 
of Schwarz (5.11), page 195). (In deriving (6.20) we have implicitly assumed the 
existence of the second moments E(X*(#)) without referring to it.) 
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Weak Stationarity A stochastic process {X(A), t ¢ T} is said to be weakly station- 
ary if it is a second order process and has properties (6.18) and (6.19): 

1)mQ =m _ forall te T. 

2) C(t) = Cov(X(s), X(s+7)) forall s € T. 


From (6.18) with ¢=0: 

Var(X(0)) = C(0) = 0?. (6.22) 
The covariance function C(t) of weakly stationary process has two characteristic pro- 
perties (without proof): 
1) |C(@)| < 0? for all t, 
2) C(t) is positive semi-definite, i.e. for all n, all real numbers a1, @9,...,an, and for 
all t),¢0,...,tn3 t; € T, 


Diet De aaj;C(t; -t;) 2 0. 


A strongly stationary process is not necessarily weakly stationary, since there are 
strongly stationary processes, which are not second order processes. But, if a second 
order process is strongly stationary, then, as shown above, it is also weakly stationary. 
Weakly stationary processes are also called wide-sense stationary, covariance statio- 
nary, or second-order stationary. 

Further important properties of stochastic processes are based on properties of their 
increments: 

The increment of a stochastic process {X(4), t € T} with respect to the interval [t,, t2) 
is the difference X(tz) — X(t). 

Hence, the variogram y(s, ft) as defined by (6.11) is a half of the second moment of the 
increment X(t) — X(s). 


Homogeneous Increments A stochastic process {X(f), t € T} is said to have homo- 
geneous or stationary increments if for arbitrary, but fixed t,,t2 ¢ T the increment 
X(t2 +T)— X(t, +7) has the same probability distribution for all values of t with pro- 
perty¢;+teT,f2+t eT. 


An equivalent definition of processes with homogeneous increments is: 

The stochastic process {X(t), t ¢ T} has homogeneous increments if the probability 
distribution of the increments X(¢+t)—X(4) does not depend on ¢ for any fixed 
t% 4¢t+t eT. 

Thus, the development in time of a stochastic process with homogeneous increments 
in any interval of the same length is governed by the same probability distribution. 
This motivates the term stationary increments. 

A stochastic process with homogeneous (stationary) increments need not be station- 
ary in any sense. 
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Taking into account (6.22), the variogram of a stochastic process with homogene- 
ous increments has a simple structure: 


(5,841) =F FLX) —X(8 +1))?] 


=F EU(X()—m)- (Xs +) —m))] 


=F FLX) m)? 2 (Xs) m) (X(s +1) —m)) + (X(s +) —m)?] 


192 12 
= 50°-C(t)+i0 
2 @) 2 
so that 

y(t) = 0° — C(t). 
Therefore, in case of a process with homogeneous increments, the variogram does 
yield additional information on the process compared to the covariance function. 


Independent Increments A stochastic process {X(f), t € T} has independent incre- 
ments if for all n = 2,3,... and for all n-tuples (f1,¢2,...,tn) with tf) <to<---<tn, 
t; € T, the increments 

X(t2) —X(t1), X(t3) — X(¢2), +s Mtn) — Xn) 
are independent random variables. 
The meaning of this concept is that the development of the process in an interval I has 
no influence on the development of the process on intervals, which are disjoint to I. 
Thus, when the price of a share is governed by a process with independent increments 


and there was sharp increase in year n, then this information is worthless with regard 
to predicting the development of the share price in year n+1. 


Gaussian Process A stochastic process {X(), t¢ T} is a Gaussian process if the 
random vectors (X(t), X(¢2), ..., X(tn)) have a joint normal (Gaussian) distribution 
for all n-tuples (¢1,f2,...,¢n) with t; e T and t) <to<---<ty; n=1,2,.... 


Gaussian processes have an important property: 
| A Gaussian process is strongly stationary if and only if it is weakly stationary. 
Gaussian processes will play an important role in Chapter 11. 


Markov Process A stochastic process {X(f), t ¢ T} has the Markov(ian) property if 
for all (n+ 1)-tuples (¢1, ¢2,...,¢m41) with t; e T and t, < ty <---<t,,1,, and for any 
A; CZ; i=1,2,...,.n+1; 
P(XMtn+1) € An+l LX(tn) € An, X(ty-1) € An-1,-- X(t) € A1) 
= P(X(tn+1) € Ansi |X(tn) € An). (6.23) 
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The Markov property can be interpreted as follows: If t¢,,; is a time point in the 
future, tn the present time poin,t and, correspondingly, ft), f2,...,f,-1 are time points 
in the past, then the future development of a process having the Markov property 
does not depend on its evolvement in the past, but only on its present state. Stochas- 
tic processes having the Markov property are called Markov processes. 


A Markov process with finite or countably infinite parameter space T is called a dis- 
crete-time Markov process. Otherwise it is called a continuous-time Markov process. 
Markov processes with finite or countably infinite state spaces Z are called Markov 
chains. Thus, a discrete-time Markov chain has both a discrete state space and a dis- 
crete parameter space. Deviations from this terminology can be found in the literature. 


Markov processes play an important role in all sorts of applications, mainly for four 
reasons: 1) Many practical phenomena can be modeled by Markov processes. 2) The 
input necessary for their practical application is generally more easy to provide than 
the necessary input for other classes of stochastic processes. 3) Computer algorithms 
are available for numerical evaluations. 4) Stochastic processes {X(f), t ¢ T} with 
independent increments and parameter space T=[0,0) always have the Markov 
property. The practical importance of Markov processes is illustrated by numerous 
examples in chapters 8 and 9. 


Theorem 6.1 A Markov process is strongly stationary if and only if its one-dimen- 
sional probability distribution does not depend on time, i.e., if there exists a distribu- 
tion function F(x) with 


Fi(x) = P(X(t) <x) = F(x) forall tT. = 


Thus, condition (6.17), which is necessary for any a stochastic process to be strongly 
stationary, is necessary and sufficient for a Markov process to be strongly stationary. 


Mean-Square Continuous A second order process {X(), t € T} is said to be mean 
-square continuous at point t=tg € T if 


lim E([X(to +h) —X(to)]*) = 0. (6.24) 
h>0 
The process {X(f), t € T} is said to be mean-square continuous in the region To, 


To cT, if it is mean-square continuous at all points t € To. 


According to section 5.2.1 (page 205), the convergence used in (6.24) is called con- 
vergence in mean square. There is a simple criterion for a second order stochastic 
process to be mean-square continuous at fg: 


A second order process {X(t), t € T} is mean-square continuous at tg if and only 
if its covariance function C(s,t) is continuous at (s, f) = (to, to). 
As acorollary from this statement: 


A weakly stationary process {X(t), t € (—%,+0)}is mean-square continuous in 
(—00, +00) if and only if it is mean-square continuous at time point t = 0. 
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The following two examples make use of two formulas from trigonometry: 


cosa cosB = F{eos(P —a)+cos(a + B)], 
cos(B -a) =cosa cosB+sina sinB. 


Example 6.6 (cosine wave with random amplitude and random phase) In modify- 
ing example 6.3, let 

X(t) =A cos(@t+ ®), 
where A is a nonnegative random variable with finite mean value and finite variance. 
The random parameter © is assumed to be uniformly distributed over [0,27] and in- 
dependent of A. The stochastic process {X(A), t € (—co, +00)} can be thought of as the 
output of an oscillator, selected from a set of oscillators of the same kind, which have 
been turned on at different times (see, e.g., Helstrom (1989)). Since 


E(cos(@t+ ®)) = x ra cos(@t + @) dp = s-[sin(or + oo" =0, 
the trend function of this process is identically zero: 
m(t) = 0. 
Its covariance function is 
C(s, t) = E{[A cos(@s + ®)][A cos(wt + ®)]} 


= E(A”) x Io" cos(@s + @) cos(wt+ @) do 


2n 1 


= F(A?) x 0 7108 w(t—s)+cos [@(s +1) +20]} dg. 


The first integrand is a constant with respect to integration. Since the integral of the 
second term is zero, C(s, t) depends only on the difference t =t-s : 


C(t) = + E(A”) COS WT. 
Thus, the process is weakly stationary. O 


Example 6.7 Let the stochastic process {X(d), t € (—00,+00)} be is defined by 
X(t) =Acosat+ Bsinot, 
where A and B are two uncorrelated random variables satisfying 
E(A) = E(B) =0 and Var(A) = Var(B) = 07 <o. 
Since Var(X(t)) = 02 <@ for all t, {X(d), t € (co, +00)} is a second order process. Its 
trend function is identically zero: m(t)=0. Thus, 
C(s, 1) = E(X(s) X(). 
For A and B being uncorrelated, E(AB) = E(A) E(B). Hence, 
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C(s, t) = E(A2cos ws cos wt + Bsinws sin wf) 
+ E(AB cos ws sin ot + ABsin ws cos wf) 
= 07 (cos @s cos at + sinws sinwf) 


+ E(AB) (cos ws sin ot + sin ws cos wf) 


2 


=o-cos@(t—s). 


Thus, the covariance function depends only on the difference t= ¢-s: 
C(t) = o2cos@t 
so that the process {X(f), t € (—00,+00)} is weakly stationary. oO 


Example 6.8 (randomly delayed pulse code modulation) Based on the stochastic 
process {X(t), t € (—00,+00)} defined in example 6.4, the stochastic process 

{ Y(t), t € (-%,+00)} with Y(t) = X(t-Z) 
is introduced, where Z is uniformly distributed over [0,1]. When shifting the sample 
paths of the process {X(f), t € (-oo,+00)} Z time units to the right, one obtains the 
corresponding sample paths of the process { Y(f), t € (—00,+00)}. For instance, shifting 
the section of the sample path, shown in Figure 6.3, Z=z time units to the right yields 
the corresponding section of the sample path of the process{Y(A), t € (—c0,+00)} de- 
picted in Figure 6.4. 
The trend function of the process {Y(d), t € (—00,+00)} is 

m(t)=a(1—p). 

To determine the covariance function, let B = B(s,t) denote the random event that 
X(s) and X(A) are separated by a switching point n+ Z; n =0,+1,+2,.... Then 


P(B)=|t-s|, P(B)=1-|t-s|. 
The random variables X(s) and X(#) are independent if |t—s|>1 and/or B occurs. 
Therefore, 
C(s,t)=0 if |t—s|>1 and/or B occurs. 


If |t—s| <1, X(s) and X(f) are only then independent if B occurs. Hence, the covar- 
iance function of { Y(f), t € (—00,+00)} given |t—s| <1 can be obtained as follows: 


vw) 
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Figure 6.4 Randomly delayed pulse code modulation 
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Figure 6.5 Covariance function of the randomly delayed pulse code modulation 


C(s, t) = E(X(s) X(0)|B)P(B) + E(X(s) X(0)|B) P(B) — m(s) m(2) 
= E(X(s)) E(X() P(B) + E(LX(s)]°) P(B) — m(s) m(0) 
=[a(1 —p)]?|t-s] +a? —p)(1 - |tsl)- [a -p)]. 
Finally, with t= ¢-—s, the covariance function becomes 
_fa*p-p)-ltl) for In| <1 
CUye {s elsewhere 


The process { Y(t), t € (-«,+00)} is weakly stationary. Analogously to the transition 
from example 6.3 to example 6.6, stationarity is achieved by introducing a uniformly 
distributed phase shift in the pulse code modulation of example 6.4. oO 


6.4 TIME SERIES IN DISCRETE TIME 


6.4.1 Introduction 


All examples in sections 6.2 and 6.3 dealt with stochastic processes in continuous 
time. In this section, examples for discrete-time processes are considered, which are 
typical in time-series analysis. The material introduced in the previous sections is 
extended and supplemented with time-series specific terminology and techniques. 


A time series is a realization (trajectory, sample path) of a stochastic process in dis- 
crete time {X(t,),X(¢2),...}. The time (parameter) space T of this process is finite, 
Le. T= {t1,¢),...,tn}, or only a finite piece of a trajectory of a stochastic process 
with unbounded time space T = {ft}, ¢2,...} has been observed. Thus, a time series is 
simply a sequence of real numbers 


X15X25005Xn 
with property that the underlying stochastic process has assumed value x; at time ¢;: 
X(t;) =x; =x(t;); 1=1,2,...,n. 

Frequently it is assumed that the ¢1,f9,...,tn are equidistant, i.e., 
t;=iAt; i=1,2,...,n. 


If the underlying stochastic process {X(d), t € T} is a process in continuous time, it 
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also can give rise to a time series in discrete time, simply by scanning the state of the 
process at discrete (possibly equidistant) time points. As with stochastic processes, 
the parameter 'time' in time series need not be the time. Time series occur in all areas, 
where the development of economical, physical, technological, biological, etc. pheno- 
mena is controlled by stochastic processes. Hence, with regard to application of time 
series, it can be referred to the introduction of this chapter. Figures 6.1 and 6.2 are 
actually time series plots. When analyzing time series, the emphasis is on numerical 
aspects how to extract as much as possible information from the time series with 
regard to trend, seasonal, and random influences as well as prediction and to a lesser 
extent on theoretical implications regarding the underlying stochastic process. 


In elementary time series analysis, the underlying stochastic process {X(f), t € T} is 
assumed to have a special structure: X(f) is given by the additive superposition of 
three components: 


X() = T(t) + S(t) + RO), (6.25) 


where 7(f) is the trend of the time series and S(f) is a seasonal component. Both T(t) 
and S(f) are deterministic functions of t, whereas R(f) is a random variable, which, in 
what follows, is assumed to have mean value E(R(t)) = 0 for all t. The seasonal com- 
ponent captures periodic fluctuations of the observations as they commonly arise 
when observing e.g. meterological parameters as temperature and rainfall against the 
time. This means that a single observation of the process {X(f), t ¢ T} made at time ¢ 
has structure 


x) =TH)+SOH+r0, (6.26) 
where 7(f) is a realization of the random variable R(¢). 


As a numerical example for a time series, Table 6.1 shows the average of the daily 
maximum temperatures per month in Johannesburg over a time period of 24 months 
(in °C) and Figure 6.6 the corresponding time series plot. The effect of a seasonal 
component is clearly visible. 


It may make sense to add other deterministic components to the model (6.25), for 
instance, a component which takes into account short-time cyclic fluctuations of the 
observations, e.g. systematic fluctuations of the temperature during a day or long-time 
cyclic changes in the electromagnetic radiation of the sun due to the 33-year period 
of sunspot fluctuations. It depends on what information is wanted. If the averages of 
the daily maximum temperatures are of interest, then the fluctuations of the tempera- 
ture during a day are not relevant. If the oxygen content in the water of a river is 
measured against the time, then two additional components in (6.25), namely the 
water temperature and the speed of the running water, should be included. This short 
section is based on the model (6.25) for the structure of a time series. 

The reader will have noticed that the term trend has slightly different meanings in 
stochastic processes and in time series analysis: 


a) The trend of a stochastic process {X(A), t € T} is the mean value m(t) = E(X(0) as 
a function of time. Hence, a stochastic process of structure (6.25) has trend function 


6 BASICS OF STOCHASTIC PROCESSES 239 


Month 7 1 2 3 4 ) 6 7 8 9 10 | 11 |} 12 
Xj 26.3 |25.6 |24.3 |22.1 [19.1 |16.5 |16.4 |19.8 |22.8 |25.0 |25.3 |26.1 
Monthi | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 
Xj 27.4 |26.3 |24.8 |22.4 |18.6 |16.7 | 15.9 |20.2 |23.4 |24.2 |25.9 |27.0 


Table 6.1 Monthly average maximal temperature in Johannesburg 


l l l 1 1 1 | 1 | 
1 23 4 5 6 7 8 9 10 11 12 13 1415 16 17 18 19 20 21 22 23 24 
month 
Figure 6.6 Time plot to Table 6.1 


m(t) = T(t) + S(t), 
since, by assumption, E(R(A)) = 0. 


b) In time series analysis, the trend 7() gives information on the average develop- 
ment of the observations in the longrun. More exactly, the trend of a time series can 
principally be obtained by excluding all possible sources of variations of the observa- 
tions (deterministic and random ones in model (6.25)). Later numerical methods are 
proposed how to do this. 


Note If 7(¢) is a parallel to the t-axis, then the time series analysts say 'the time series has no 
trend'. This terminology should not be extended to the trend functions m(f) of stochastic pro- 
cesses. A constant trend function is after all a trend function as well. 


6.4.2 Smoothing of Time Series 


Smoothing techniques are simple and efficient methods to partially or completely 
‘level out' deterministic and/or random fluctuations within observed time series, and 
in doing this they provide information on the trend 7(¢) of a time series. The idea 
behind smoothing is a technique, which is well-established in the theory of linear 
systems, and which is denoted there as filtration. Its basis is a linear filter, which 
transforms a given time series {x;} = {x0,X1,...Xn} of length n+ 1 into a sequence 
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tit = {YaVatl> oVn-bt 


of length n+ 1—a-—b as follows: 


k+b 
Ye= LL Wixi; k=a,a+1,....n—b, 0<a,b<n, (6.27) 


i=k—a 
or 


Ve =W-aXp-gtW-atiXieatt tet WpxXpaps K=a,atl,...,n—b. 


The parameter w; are the weights assigned to the respective observations x;, whereas 
the interval [—a, b] determines the bandwidth of the filter. The weights will usually 
be positive, but can also be negative. They must satisfy the normalizing condition 
So eS. (6.28) 
To illustrate the filter, let a= b= 2. Then (6.27) becomes 
Vk = W2XK-2 FW_1XK-1 FWOXK TWINK + WIXKYD- 


Thus, y, is calculated as the sum of those weighted values, which the time series 
{x;} assumes at time points kK-2, k—1,k,k+1, and k+2. It is obvious that in this 
way a 'smoother' sequence than {x;} is generated, i.e. {y;} will exhibit fewer fluctu- 
ations, and its fluctuations will have on average smaller amplitudes than {x;}. Depend- 
ing on the aim of smoothing, bandwidth and weights have to be chosen accordingly. 
If the aim is to level out periods of seasonal influence in order, e.g., to get information 
on the trend of {x;}, then a large bandwidth must be applied. The weights w; should 
generally be chosen in such a way that the influence of the x; on the value of y, de- 
creases with increasing timely distance |t,—1;| of x; to yx. 


Moving Averages A simple special case of (6.27) is to assume a = b and 


1 : 
ines bel for i=—b,-b+1,---,b-1,5b, 


0 otherwise. 


This case is denoted as M.A.(25+1). The corresponding bandwidth is [—b,+5] and 
comprises 25 + | time points. 


Special cases: 1) Ifb=1, then y,; is calculated from three observations (/.A.(3)): 
Ve= ; [Xe-1 + XE +X i41]- 
2) Ifb =2, then yz, is calculated from 5 observations (V.A.(5)): 
Ve= 2 [e2 F XRF XK TK + X42 ]- 


Frequently, the time point x is interpreted as the presence, so that time points smaller 
than k belong to the past and time points greater than 4 to the future. Particularly inter- 
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esting is the case when y, is calculated from the present value and past values of {x;}. 
This case is given by (6.27) with 6 = 0. For instance, with a = 2 and equal weights, 
1 
Ve= 3 ete +%i-21- 
In this case it makes sense to interpret y; as a prediction of the unknown value x;,1. 


Smoothing with the Discrete Epanechnikov Kernel The Epanechnikov kernel is 
given by bandwidth [—, b] and weights 


“2 
win [to for i=0, +1,...,£0. 
(b+1)? 
The factor c makes sure that condition (6.28) is fulfilled: 
b(46+5) 


-l 
e=[14 3(b+ 1) 
For instance, if b =2, then c x 0.257 and y, is given by 


Vk = W2XK-2 TWA XK-] FWOXK TWX KL] FT WINX KD 
= [0.556 x,_9 + 0.889 x, 1) +x, +0.889 x74, +0.556Xx;,9] c. 
This filter is convenient for numerical calculations: 1) Its input is fully determined by 
its bandwidth parameter b, and 2) the weights have the symmetry property w_; = w;. 
Moreover, the observation x; has the strongest impact on y,;, and the impact of the 


x; on yz becomes smaller with increasing distance of ¢; to t;. The larger the param- 
eter 5, the stronger is the smoothing effect. 


Exponential (Geometrical) Smoothing This type of smoothing uses all the ‘past! 
values and the "present" value of the given time series {x9, X1,...,Xn} to calculate y;, 
from the observations x;, x;_1,...,X9 in the following way: 


Ve=Ne(k)xE+ACL —A)e(K) xR +2 FAC — A) c() x9, K=0,1,....0, (6.29) 
where the parameter A satisfies 0 <2 <1. Hence, the weights are 
w_j;=A(1—-A)'e(k) for i=k, k—-1,..., 1,0. 


The bandwidth limitation a = a(k) =k+1 depends on k, whereas b=0. The factor 
c(k) ensures that condition (6.28) is fulfilled (apply formula (2.18) with x = 1—A): 

1 
c(k) = —————__.. 6.30 
©- =a 5m (6.30) 


Since c(0) = 1/A and c(1) = 1/A(2 —A), smoothing starts with yg =xq, and 


ios 


1 1 
Vi =a 4X1 t+ ay %0 =F Bit -A)xo). 


A strong smoothing of {x;} will be achieved with small values of 4 since in this case 
even the 'more distant' values have a nonnegligible effect on y,. To achieve the 
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k 2 4 6 8 10 12 14 16 18 | 20 22 
X=0.2 |2.778| 1.694} 1.355} 1.202 | 1.120} 1.074 | 1.046 | 1.029 | 1.018} 1.012 | 1.007 
A=0.4 |1.563] 1.149} 1.049 | 1.017 | 1.006 | 1.002 | 1.001 | 1.000 | 1.000} 1.000 | 1.000 


Table 6.2 Convergence of c(k) towards | with increasing k 


desired result, one should try different values of 4 . As a rule of thumb, start with a 
value between 0.1 and 0.3. 


Table 6.2 shows that even for fairly small values of 4 the factor c(k) tends to 1 rather 
fast. Therefore, in particular when smoothing large time series (which possibly origi- 
nated in the 'distant past’), c(k) = 1 is frequently assumed to be true right from the 
beginning, 1.e., for all A=0,1,.... Under this assumption, equation (6.29) can be 
written in the recursive form 


Yea AXA -AVE-15 YO=X0, k= 1,2,...,0. (6.31) 


Table 6.3 gives some principal guidelines about the choice of 2 when smoothing. 


Effect of the choice of Aon: X large X small 
Smoothing little strong 
Weights of distant observations small large 
Weights of near observations large small 


Table 6.3 Choice of 4 in exponential smoothing 


Table 6.4 shows once more the original time series {x;} from Table 6.1, the respec- 
tive sequences {y;} obtained by 7.4.(3), by the Epanechnikov kernel (Ep) with b = 2, 
and by exponential smoothing with 4 = 0.6 and (6.31), starting with y; =x, (Ex 0.6). 
Figure 6.7 illustrates the results for exponential smoothing and for the Epanechnikov 
approach. With the parameters selected, the sequences {y;} essentially follow the 
seasonal (periodic) fluctuations, but cleary, the original time series has been 
smoothed. 


Short-Time Forecasting The recursive equation (6.31) provides an easy and effi- 
cient possibility for making short-time predictions: Since y, only depends on the 
observations x; made at time points before or at time k, y, can be considered an 
estimate of the value the time series {x;} will assume at time point k+1. If this 
estimate is denoted as x,,1, equation (6.31) can be rewritten as 


Xe) =AXE+(—-A)YX-13 YO=XO, K=1,....m. 


This equation contains all the information on the development of the time series up to 
time point k, and gives an estimate of the value of the next observation at time k+ 1. 
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Monthi | 1 2 3 4 5 6 7 8 9 10 | 11 | 12 
Ky 26.3 |25.6 |24.3 |22.1 19.1 |16.5 |16.4 /19.8 }22.8 |25.0 |25.3 |26.1 
M.A.3 25.4 }24.0 |21.8 |19.2 |17.3 |17.6 |19.7 }22.5 |24.4 |25.5 |26.3 
Ep b=2 23.6 |22.6 |19.5 |18.3 |18.5 |20.0 ]22.1 |24.0 }25.4 |26.1 
Ex 0,6 |26.3 |25.9 |24.9 |23.2 |20.7 |18.2 17.1 |18.7 |21.2 |23.5 |24.6 |25.5 
Monthi | 13 | 14 | 15 16 | 17] 18 | 19 | 20 | 21 | 22 | 23 | 24 
Xj 274 |26.3 |24.8 |22.4 |18.6 |16.7 |15.9 |20.2 |23.4 |24.2 |25.9 |27.0 
MA3 |26.6 |26.2 [24.5 [21.9 |19.2 |17.1 |17.6 |19.8 |22.6 |24.5 |25.7 
Ep b=2 |26.2 |25.6 |24.1 /21.8 |19.5 |18.3 |18.5 |20.0 |22.1 |24.2 
Ex 0.6 |26.6 |26.4 |25.4 |23.6 |20.6 |18.3 |16.9 |18.9 |21.6 |23.2 |24.9 |26.2 

Table 6.4 Data from Table 6.1 and the effect of smoothing 
0C — original time series 

27 — expon 

25 — > Epan 

237 

21- 

19- 

177 

ISP ! ! | i 1 1 | | | | | | 1 | 1 l | | | 1 1 | | > 

1 23 4 5 6 7 8 9 10 I1 12 13 1415 16 17 18 19 20 21 22 23 24 

month 


6.4.3 Trend Estimation 


Figure 6.7 Time series plot for Tables 6.1 and 6.4 


To obtain information on the trend 7(f) of a time series by smoothing methods, the 
bandwidths of the M.A. technique and of the Epanechnikov kernel must be sufficient- 
ly large to be able to filter out seasonal (periodic) fluctuations. The time series given 
by Table 6.1, as with most other meterological and many economical time series, has 
a period of 12 months. Thus, good smoothing results can be expected with V4.6 with 
b= 12. In case of exponential smoothing, the parameter 4 needs to be small enough 
to achieve good smoothing results. All these techniques require sufficiently long time 


series with respect to the length of the periods of seasonal influences. 


Smoothing techniques, however, do not yield the trend as a (continuous) function. But 
they give an indication which type of continuous function can be used to model the 
trend best. In many cases, a linear trend function 
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T(t)=a+ft (6.32) 


will give a satisfactory fit, at least piecewise. Thus, when the original time series {x;} 
has been smoothed to a time series without seasonal component {y;), then the 
problem of fitting a linear trend function to {y;} is equivalent to determining the 
empirical regression line to the values {y;}. According to formulas (3.46), page 143, 
estimates for the coefficients a and B are 


n . n me 
20% —) (tj - 0) anys : ; 
2 : = pp BaP at, (6.33) 
d(t; -7)? x tj -ni 
i=l i=] 


where the y; just as the x; belong to the time points ¢;. For estimations of more com- 
plicated trend functions, i.e. polynomial ones of higher order than 1, the use of a 
statistical software package is recommended. 


Removing the seasonal influences from a time series of structure (6.26) led to the 
time series {y;}. The next step might be to eliminate the influence of the trend from 
the time series as well. In many cases this can be achieved, at least approximately, by 
going over from the time series {y;} to the time series {7;} with 

r; =y;— T(t), ba 1 26s (6.34) 
where 7(t;) is the value of the trend at time t; (obtained by smoothing the sequence 
{y;}). Thus, {r;} = {71,72,.-.,/n} is the time series, which arises from the original 
time series {x;} by eliminating both seasonal influences and trend. Hence, fluctua- 
tions within the sequence {7;} are purely due to random influences on the develop- 
ment of a time series. The sequence {7;} is frequently assumed to be the trajectory of 
a weakly stationary discrete-time stochastic process {R(t,), R(t2),...,R(tn)}. The next 
section deals with some stationary discrete-time stochastic processes {R(f), t € T}, 
which are quite popular in time series analysis as models for the random component 
in time series. 


Example 6.9 Let us again consider the time series of Table 6.1. This series is too 
short for long-time predictions of the development of the monthly average maximum 
temperatures in Johannesburg, but it is suitable as a numerical example. To eliminate 
the seasonal fluctuations, the 7.4.(13) technique is applied. Table 6.5 shows the re- 
sults. For instance, the values y7 and yj in the smoothed series {y7,y7,...,¥1g} are 


yy =e Dir xp = 26.3 +25.6 + 24.3 +22.1419.1416.5 +164 


+ 19.8 +22.8 +25.0+25.3+26.1+27.4) =22.8, 
Vig = yy Lei Xi = py(26.1 + 27.4+26.3 424.8 + 22.44 18.6 + 16.7 


+ 15.9+20.2 + 23.4 + 24.2 +25.9+27.0) = 23.0. 
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monthi| 7 8 9 10 11 12 13 14 15 16 17 18 

Vi 22.8 | 22.8 | 22.8 | 22.6 | 22.3 | 22.2 | 22.1 | 22.4 | 22.7 | 22.8 | 22.9 | 23.0 
T(t;) 22.4 | 22.5 | 22.5 | 22.5 | 22.6 | 22.6 | 22.6 | 22.6 | 22.7 | 22.7 | 22.7 | 22.9 
r; 0.4 | 0.3 | 0.3 | 0.1 | -0.3 | -0.4 | -0.5 | -0.2 | 0.0 | 0.1 | 0.2 | O1 


Table 6.5 Results of a time series analysis for the data of Table 6.1 


The time points ¢; in Table 6.5 refer to the respective month, Le. t; =i, 1=7,8,..., 18, 
so that 


= 18 5 18, 
P= 4 Liepyi=22,6 and #= 4 Dyzi= 125. 
Table 6.5 supports the assumption that the trend of the time series {x;} in the interval 


[7, 18] is a linear one. By (6.33), estimates of its slope and intercept are & = 0.0308 
and B = 22.215. Hence, the linear trend of this time series between t= 7 and t= 18 is 


T() = 0.0308 4+ 22.215, 7<t< 18. (6.35) 


Letting t= 7,2,...,18 yields the third row in Table 6.5 and the fourth row contains the 
effects r;=y;—T(t;) of the purely random component' R(t). Figure 6.8 shows the 


‘smoothed values' y; and the linear trend (6.35) obtained from these values. Oo 
23 Se ees See se eee See Se eee eee eee ee eee =} 
as + 
+ 
Oy) ae ee ae ne ee ee en eee ee ee ee ee ee ee 


Figure 6.8 Linear trend and M.M.(13)-smoothed values for example 6.9 


Some statistical procedures require as input time series which are sample paths of 
(weakly) stationary stochastic processe (see section 6.4.4). If the time series {x;} has 
trend 7(), then the underlying stochastic process cannot be stationary. By replacing, 
however, the original time series {x;} with 


{yj =x; — T(t); i= 1,2,...,n}, 
one frequently gets a time series, which is at least approximately the sample path of a 
discrete-time stationary process. At least, the time series {y;} has no trend. 


For getting into theory and applications of time series, the text Chatfield (2012) is 
recommended. Other recent books are e.g. Madsen (2008) and Prado, West (2010). 
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6.4.4 Stationary Discrete-Time Stochastic Processes 


This section deals with some discrete-time stochastic processes and their stationary 
representations, which play an important role in time series analysis. They are de- 
signed as models for the underlying mathematical structure of stochastic processes, 
which generate the observed time series, or at least as models for their random com- 
ponents. Knowledge of this structure is particularly essential for the prediction of not 
yet observed values and for analyzing stochastic signals in communication theory. 
The models are related to smoothing techniques, but now the x; are no longer real 
numbers observed over a time interval, but time-dependent random variables pointed 
out before, discrete-time stochastic processes are actually sequences of random varia- 
bles. Hence, in what follows they are written as {..., X_2,X_1,X0,X1,X2,...} if the 
process started 'in the past', and {X9,X1,...} or {X1,X9,...} otherwise. 


Purely Random Sequence Let {..., X_7,X_,,X0,X1,X2,...} be a sequence of inde- 
pendent random variables, which are identically distributed as X with 
E(X) =0 and Var(X) = 07. (6.36) 
The trend function of this sequence is identically equal to 0: 
m(t)=0; ¢=0,+1,+2,.... 


The covariance function of the purely random sequence is 


0 for s#t 
C(s,t) = ‘ 
oe a for s=t, 
or, letting t=t-s, 
2 
o* for t=0, 
ca) = 0 fort#0. 2) 


The purely random sequence is also called discrete white noise. If, in addition, the X; 
are normally distributed, then {..., X_.,X_,,X0,X1,X2,...} is called a Gaussian 
discrete white noise. The purely random sequence is the most popular discrete-time 
stochastic process for modelling a random noise, which superimposes an otherwise 
deterministic time-dependent phenomenon. An example for this is the stochastic 
process given by (6.25). Its components S(¢) and 7(t) are deterministic. 


Sequence of Moving Averages of Order n. Notation: V.A.(n). Let the random var- 
iable Y; be given by 

WaDege aes 7G 5122..3: 
where 7 is a positive integer, cg,c),...,Cn are finite real numbers, and {X;} is the 
purely random sequence with parameters (6.36) for all t=0,+1,+2,.... Thus, the 
random variable Y; is constructed from the 'present' X; and from the n 'preceding' 
random variables X;_1,Xj-2,....Xt-n. This is again the principle of moving averages 
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introduced in the previous section fo ther realizations of the X;. In view of (4.56), 
page 187, 


n 
Var(¥1)=02% c? <w, t=0,41,+42,..., 
i=0 


so that {Y;, t=0,+1,+2,... } is a second-order process. Its trend function is identically 
equal to 0: 


m(#) = E(Y;)) =0 for t=0,+1,42,.... 


For integer-valued s and ¢, 


Cls,t) = E(Ys¥/) = Al z ciXes| [s crXe) 


n n 
-H($ py cree KiXes): 
i=0 k=0 


Since E(X,_;X;-4) =0 for s—i#t—k, the double sum is 0 when |t—s| >. Other- 
wise there exist i and k so that s—i=t—k. In this case C(s, t) becomes 


2 
C(s,)=E) Xe cjeyn sri XG; 
O<i<n 
0<|t-s|+i<n 


ses! 
=o 2 CiC|t-s|ti- 


i=0 


Letting t=t-—s, the covariance function C(s, t) = C(t) becomes 


(6.38) 


C(t) = 07 [e9e |x} + C1 Cirl41 +2 +Cn—-|z] Cn] for O<|t| <n 
0 for Itl>n 


Thus, the sequence of moving averages {Y;, t=0,+1,+2,...} is weakly stationary. 


Special case: Let c; = — 7D: i=0,1,...,n. Then the sequence M.A.(n) becomes 
1 n 
Y= DXi; t=0,+1 ,+42,..., 
n+l i=0 


and the covariance function (6.38) simplifies to 


2 
7 oes.) for 0<|t| <n, 


_jntl n+l 
cys 0 for |t| >n. 
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Sequence of Moving Averages of Unbounded Order. Notation: M.A.(). Let 


Y= dX c)X,7; t=0,+1,+2...., (6.39) 


(= 
where {X;} is the purely random sequence with parameters (6.36), and the c; are real 
numbers. 


Remark The random sequence {Y1, t=0,+1,+2,...} defined in this way is sometimes called a 
linear stochastic process. 

To guarantee the convergence of the infinite series (6.39) in mean square, the c; must 
satisfy 


Lc; <0. (6.40) 
i= 


From (6.38), the covariance of the sequence .4.(«) is 
COSo7 Seeing TSO sl: (6.41) 
i=0 
In particular, the variance of Y; is 


Var(¥1) = C(0)=07% c?; t=0,41,42,.... 
i=0 


If the doubly infinite sequence of real numbers 
{..45€-2,C_1,C05C15,€2,+-} 
satisfies the condition 
sor 
i=-00 
then the doubly infinite series of random variables 
{...5 Y_2, Y_1, Yo, Y1, Yo, ...} 
defined by 


Wea: Dee Xyae 1S 0 2122 1.8 (6.42) 


1=—00 


is also weakly stationary, and it has covariance function 
co 
Coa 07 Deen tS 0b ce 
i=—00 
and variance 
CO 
Var(¥;)=02 Y c?; t=0,41,42,.... 
i=—00 


In order to distinguish between the sequences of structure (6.39) and (6.42), they are 
called one- and two-sided sequences of moving averages, respectively. 
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Autoregressive Sequence of Order 1 (Notation: AR(1)) Let a and b be finite real 
numbers with |a| <1. Then a doubly infinite series {Y;} is recursively generated by 
the equation 

Yp;=aY,1)+6X1; t=0,+1,+2,..., (6.43) 
where {X;} is the purely random sequence with parameters (6.36). (Note the analogy 
to the recursive equation (6.31).) Thus, the 'present' state Y; depends directly on the 
preceding one Y,_; and on a random noise term bX; with mean value 0 and variance 
b2o02. The n-fold application of (6.43) yields 


n—l . 
Yi=a"Ypn+b ¥ aiX;;. (6.44) 
i=0 


This formula shows that the influence of a past state Y;» on the present state Y; on 
average decreases as the distance n between Y;» and Y; increases. Hence it can be 
anticipated that the solution of the recurrent equation (6.43) is a stationary process. 
This stationary solution is obtained by letting n tend to infinity in (6.44): Since there 
holds jim a" =0, 


Y;=b >d a'X,;, t=0,+1,+42,.... (6.45) 
i=0 


The doubly infinite random sequence {Y+; f= 0,+1,+2,...} generated in this way is 
called a first-order autoregressive sequence or an autoregressive sequence of order | 
(shortly: AR(1)). This sequence is a special case of the random sequence defined by 
(6.38), since letting there c; = ba’ makes the sequences (6.38) and (6.45) formally 
identical. Moreover, condition (6.40) is fulfilled: 

BS (ai)? = 2S a2 =P coo, 

i=0 i=0 l-a? 

Thus, an autoregressive sequence of order | is a weakly stationary sequence. Its co- 
variance function is given by formula (6.41) with c; = ba’: 


C(t) =(6 o)? > gigi = (b 6)2 ql" Sg 
=0 i=0 


so that 


2 
C(t) = OM alt, ¢=0,41,42,... 
l-a 


Autoregressive Sequence of Order r (Notation: AR(r)) In generalization of the re- 
cursive equation (6.43), let for a given sequence of real numbers a1, qa2,...,a~ with 
finite a; and finite integer r random variables Y; be generated by 


Ye=a,¥pytaoYpot:::+arYrr + bX, (6.46) 


where {X;} is a purely random sequence with parameters (6.36). The sequence 
{Y¥;; t=0,+1,+2,... } is called an autoregressive sequence of order r. 
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It is interesting to investigate whether analogously to the previous example a weakly 
stationary sequence 


Yr= Leo c;X-7; t=0,+1,42,..., (6.47) 
exists, which is solution of (6.46). Substituting (6.47) into (6.46) yields a linear alge- 
braic system of equations for the unknown parameters c; : 

co=b 
C1 —a1Cg =0 
C2 —a1C} —a7C9 =0 


Cr—Q,C;_-1 —**'-arCg =0 
Cj-4,Cj-1 —-°- arc; =9; t=rtl,r+2,---. 
It can be shown that a nontrivial solution {cg,c1,---} of this system exists, which 


satisfies condition (6.40) if the absolute values of the solutions y1,y2,...,¥,r of the al- 
gebraic equation 


y -ayy" | ----a,y-ar=0 (6.48) 
are all less than 1, i.e., they are within the unit circle. (Note, this is solely a property 
of the sequence a ,q9,...,ar.) In this case, the sequence {Y;; ¢=0,+1,+2,...} given 
by (6.47) is a weakly stationary solution of (6.46). 

Special Case r=2 Let y, and yp be the solutions of 

y* —a,y—az =0 (6.49) 
with |y;| <1 and |y2| <1. Then, without proof, the covariance function of the cor- 
responding weakly stationary autoregressive sequence of order 2 is 


for yj #y2 


(-ypys't -a-y py 
C(t) = C(O fe Ope 2, os, 6.50 
Oe On y F919) een) 
and for yj =y2 =yo 
(1-¥o a 
C(t) = C(0) [Pa lel po ; t=0,41,42,..., (6.51) 
1+y9 
where the variance C(0) = Var(Y1) both in (6.50) and (6.51) is 
C(0) = lna ay boy’. 
(1 +a)| (I-a)?-a7 | 


If the solutions of (6.49) are complex, say, 


yi =yoe’® and y2 =ype 
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with real numbers yg and @, then the covariance function assumes a more conven- 
ient form than (6.50): 


C(t) = C(0) ay!*!sin(olt| +B); t=0,+1,+2,..., 


where 


and B= atten > tan a 
1-yo 


If yj =y2 =yo, then this representation of C(t) is identical to (6.51). 


1 
sinB 


Example 6.10 Consider an autoregressive sequence of order 2 given by 
Y;=0.6Y,1 —0.05Y,.+2X7; t=0,+1,+2,.... 


with o” = Var(X;) = 1. It is obvious that the influence of Y;_7 on Y; is small compar- 
ed to the influence of Y;_; on Y;. The corresponding algebraic equation (6.49) is 


y? —0.6y+ 0.05 =0. 


The solutions are y; = 0.1 and yz =0.5. The absolute values of y; and y2 are smal- 
ler than | so that the random sequence, generated by (6.46), is weakly stationary. Its 
covariance is obtained from (6.50): 


C(t) = 7.017 (0.5)!*! — 1.063 (0.1) "1; + = 0, +1, #2,.... 


As expected, with increasing |t| = |¢—s|, i.e, with increasing timely distance between 
Y; and Ys, the covariance is decreasing. The variance has for all ¢ the value 


Var(Y;) = C(0) = 5.954. Oo 


Autoregressive Mean Average (r, s)-Models. (Notation: ARMA(r, s)). Let the ran- 
dom sequence {Y;; t=0,+1,+2,...} be generated by 


Yp=+a,V¥p1 +a2VYpot-:-t+arYrr (6.52) 
thy X¢4+b,X-1 4+°: + bsXt-5, 


where {X;} is the purely random sequence with parameters (6.36). It can be shown 
that (6.52) also generates a stationary random sequence {Y;} if the absolute values of 
the solutions of the algebraic equation (6.48) are less than 1. 


The practical work with ARMA-models and its special cases is facilitated by the use 
of statistical software packages. Important problems are: Estimation of the parameters 
a; and b; in (6.46) and (6.52), estimation of trend functions, detection and quantifica- 
tion of possible cyclic, seasonal, and other systematic influences. In particular, reliable 
predictions are only possible if structure and properties of the random component 
{R(t), t€ T} as stationarity, Markov property, and other properties not taken into 
account in this short section are known. 
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6.5 EXERCISES 


6.1) A stochastic process {X(f), > 0} has the one-dimensional distribution 
{Fi(x) = P(X(t) <x) = 1-e 0)", x 20, t> 03. 


Is this process weakly stationary? 


6.2) The one-dimensional distribution of a stochastic process {X(A), t> 0} is 


uu? 


Fi(x) = P(X) < x) = aa ie 2 du 


2nt Oo oH 


with u>0, o> 0; x € (-c0+0). 


Determine its trend function m(f) and, for 1 = 2 and o = 0.5, sketch the functions 
yyO)=mO)+ JVar(XO) and y2()=m(t)—- JVar(xXl) . 


6.3) Let X(f) = A sin(wt+ ©), where 4 and © are independent, non-negative random 
variables with ® uniformly distributed over [0,27] and E(A) < 0. 


(1) Determine trend, covariance, and correlation function of {X(A), t € (—c0, +00)}. 


(2) Is the stochastic process { X(t), t € (—00, +00)} weakly and/or strongly stationary? 


6.4) Let X(¢) = A(t) sin(o@t+@) where A(f) and ® are independent, non-negative 
random variables for all ¢, and let Dbe uniformly distributed over [0, 27]. 


Verify: If {A(A, t € (co, +00)} is a weakly stationary process, then the stochastic pro- 
cess {X(f), tf € (—00,+00)} is also weakly stationary. 


6.5) Let {a1,a2,...,an} be a sequence of real numbers, and {®;,@),...,@,} bea 
sequence of independent random variables, uniformly distributed over [0, 27]. 


Determine covariance and correlation function of the process {X(d), t € (—%, +00)} 
given by 
X(t) = D1 a;sin(o@t+ ®,). 


6.6)* A modulated signal (pulse code modulation) {X(4), t © (—00,+00)} is given by 
X(t) = D7 Anh(t—-n), 


where the A, are independent and identically distributed random variables which 
can only take on values —1 and +1 and have mean value 0. Further, let 
1 for O<t<1/2 
H= : 
aU 0 elsewhere 
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(1) Sketch a possible sample path of the stochastic process {X(f), t € (—00,+00)}. 

(2) Determine the covariance function of this process. 

(3) Let Y(t) =X(t-Z), where the random variable Z has a uniform distribution over 
(0, 1]. 

Is { Y(0), t € (—co, +00)} a weakly stationary process? 


6.7) Let {X(d), t € (co, +00)} and {Y(A, ¢ € (—c0,+00)} be two independent, weakly 
stationary stochastic processes, whose trend functions are identically 0 and which 
have the same covariance function C(t). 


Verify: The stochastic process {Z(d), t € (—00, +00)} with 
Z(t) = X(t) cos wt — Y(t) sin wt 


is weakly stationary. 


6.8) Let X(4) = sin ®t, where ® is uniformly distributed over the interval [0, 27]. 
Verify: (1) The discrete-time stochastic process {X(4); f= 1,2,...} is weakly, but not 
strongly stationary 


(2) The continuous-time stochastic process {X(f), t= 0} is neither weakly nor strong- 
ly stationary. 


6.9) Let {X(0), t € (—00, +00)} and { Y(t), t © (—c0, +00)} be two independent stochastic 
processes with trend and covariance functions 


mx(t), my(d) and Cx(s,1), Cy(s,2), 
respectively. Further, let 
U(t) = X(0) + YO and Vip) = XO -— YO, t € (—, +00). 
Determine the covariance functions of the stochastic processes {U(), t € (—%,+0)} 


and {V(t), t € (—co,+00)}. 


6.10) The following table shows the annual, inflation-adjusted profits of a bank in the 
years between 2005 to 2015 [in $10°]. 


Year 1 (2005) | 2 3 4 5 6 7 8 9 10 11 
Profit x; | 0.549 | 1.062 |1.023 | 1.431 |2.100 | 1.809 |2.250 |3.150 |3.636 |3.204 |4.173 


(1) Determine the smoothed values {y;} obtained by applying M.A.(3). 
(2) Based on the y;, determine the trend function (assumed to be a straight line). 


(3) Draw the original time series plot, the smoothed version based on the y;, and the 
trend function in one and the same Figure. 
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6.11) The following table shows the production figures x; of cars of a company over 
a time period of 12 years (in 103). 


Yeari| 1 2 3 4 B) 6 7 8 9 10 11 12 
Xj 3.08 | 3.40 | 4.00 | 5.24 | 7.56 | 10.68 | 13.72 | 18.36 |23.20 |28.36 |34.68 | 40.44 


(1) Draw a time series plot. Is the underlying trend function linear? 
(2) Smooth the time series {x;} by the Epanechnikov kernel with bandwidth [—2, +2]. 


(3) Smooth the time series {x;} by exponential smoothing with parameter \ = 0.6 
and predict the output for year 13 by the recursive equation (6.31). 


6.12) Let Ys=0.8Y,-)+X+; ¢=0,+1,+2,..., where {X;;t=0,+1,+2,...} is the 
purely random sequence with parameters E(X1) = 0 and Var(X;) = 1. 


Determine the covariance function and sketch the correlation function of the autore- 
gressive sequence of order 1 {Y¥+; t=0,+1,+2,...}. 


6.13) Let an autoregressive sequence of order 2 {Y;; t=0,+1,+2,...} be given by 
Y;-1.6Y;1 +0.68Y,2 =2X+; t=0,+1,+42,..., 

where {X;; ¢=0,+1,+2,...} is the same purely random sequence as in the previous 

exercise. 

(1) Is the the sequence { Y;; f= 0,+1,+2,...} weakly stationary? 


(2) Determine its covariance and correlation function. 


6.14) Let an autoregressive sequence of order 2 {Y;; t=0,+1,+2, ...} be given by 
Y;-0.8Y,1 —0.09Y;. =X7; t=0,+1,+2,.... 


where {X+; t= 0,+1,+2, ...} is the same purely random sequence as in exercise (6.12). 
(1) Check whether the sequence {Y¥;; f=0,+1,+2,...} is weakly stationary. If yes, 
then determine its covariance function and its correlation function. 


(2) Sketch its correlation function and compare its graph with the one obtained in ex- 
ercise (6.12). 


CHAPTER 7 


Random Point Processes 


7.1 BASIC CONCEPTS 


A point process is a sequence of real numbers {f1,¢2,...} with properties 

ty<tg<-++ and lim t;=+00. (7.1) 

I> 0 

That means, a point process is a strictly increasing sequence of real numbers, which 
does not have a finite limit point. In practice, point processes occur in numerous situ- 
ations: arrival time points of customers at service stations (workshops, filling stations, 
supermarkets, ...), failure time points of machines, time points of traffic accidents, 
occurrence of natural disasters, occurrence of supernovas,.... Generally, at time point 
t; a certain event happens. Hence, the ¢; are called event times. With regard to the ar- 
rival of customers at service stations, the tf; are also called arrival times. If not stated 
otherwise, the assumption ft; => 0 is made. 
Although the majority of applications of point processes refer to sequences of time 
points, there are other interpretations as well. For instance, sequences {f1,f2,...} can 
be generated by the location of potholes at a road. Then t; denotes the distance of the 
i th pothole from the beginning of the road. Or, the location is measured, at which a 
beam, which is randomly directed at a forest stand, hits trees. (This is the base of the 
Bitterlich method for estimating the total number of trees in a forest stand.) All these 
applications deal with finite lengths (time or other). To meet assumption (7.1), they 
have to be considered finite samples from the respective point processes. 


A point process {t,¢2,...} can equivalently be represented by the sequence of its 
interevent (interarrival) times 
{¥1,¥2...-$ with y;=tj;-t1; 7=1,2,...5 19 =. 
Counting Process Frequently, the event times are of less interest than the number of 
events, which occur in an interval (0, t], t> 0. This number is denoted as n(A): 
n(t) = max {n, tn <t}. 


For obvious reasons, {n(¢), f= 0} is said to be the counting process belonging to the 
point process {t,,¢,...}. Here and in what follows, it is assumed that more than one 
event cannot occur at a time. Point processes with this property are called simple. The 
number of events, which occur in an interval (s, f], s < t, is 


n(s,t) = n(t) —n(s). 
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To be able to count the number n(A) of events which occur in an arbitrary subset 4 
of [0,°0) the indicator function of the event 't; belongs to A' is introduced: 


1 if t; eA 


i= 0 otherwise © 


(7.2) 


Then, 
n(A) = Lino 1i(A). 


Example 7.1 Let a finite sample from a point process be given: 
S= {2,4,10, 18,24, 31,35, 38,40, 44,45, 51,57, 59}. 


These figures indicate the times (in seconds) at which within a time span of a minute 
cars pass a speed check point. In particular, in the interval A = (30, 45] 


n(30,45) = n(45) —n(30) = 11-5 =6 


cars passed this check point. Or, in terms of the indicator function of the event 
A=(30,45], 


131 (A) = 135(A) = 138(A) = L40(A) = Lgq(A) = 145(A) = 1, 
I(A)=0 fori € S\A. 


Hence, 
n(30,45) = Deg L(A) = Dey L(A) = 6. Oo 


Recurrence Times The forward recurrence time of a point process {f1,f2,...} with 
respect to time point ¢ is defined as 
a(t) =tyi, —t for th <t<tyy1; n=0,1,..., to =0. (7.3) 
Hence, a(f) is the time span from ¢ (usually interpreted as the 'presence’) to the occur- 
rence of the next event. A simpler way of characterizing a(A) is 
a(t) = En(t)+1 =f. (7.4) 
tn(t) is the largest event time before ¢ and ¢,,),1 is the smallest event time after ¢. 


The backward recurrence time b(t) with respect to time point ¢ is 
b(t) =t-tn. (7.5) 


Thus, b(4) is the time which has elapsed from the last event time before ¢ to time ¢. 


Marked Point Processes Frequently, in addition to their arrival times, events come 
with another piece of information. For instance: If t; is the time point the ith custom- 
er arrives at a supermarket, then the customer will spend there a certain amount of 
money m;. If ¢; is the failure time point of a machine, then the time (or cost) m; ne- 
cessary for repairing the machine may be assigned to ¢;. If t; denotes the time of the 
ith bank robbery in a town, then the amount m; the robbers got away with is of in- 
terest. If ¢; is the arrival time of the ith claim at an insurance company, then the size 
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m, of this claim is important to the company. If ¢; is the time of the ith supernova in 
a century, then its light intensity m; is of interest to astronomers, and so on. This 
leads to the concept of a marked point process: Given a point process {f1,f9,...}, a 
sequence of two-dimensional vectors 


(41,74); (t2,m2),..-} (7.6) 


with m; being element of a mark space M 1s called a marked point process. In most 
applications, as in the four examples above, the mark space M is a subset of the real 
axis (—00, +00) with the respective units of measurements attached. 


Random Point Processes Usually the event times are random variables. A sequence 
of random variables {71,7>,...} with 


T,<T)<-+» and P(lim T;=+0)=1 (7.7) 
i> 0 


is a random point process. By introducing the random interevent (interarrival) times 
Y,;=7;-Tj-13 i= 12355 To =0, 
a random point process can equivalently be defined as a sequence of positive random 
variables {Y, Y2,...} with property 
5 n 
PC lim di-0 Y,;=0)=1. 


With the terminology introduced in section 6.1, a random point process is a discrete- 
time stochastic process with state space Z =[0,+00). Thus, a point process (7.1) is a 
sample path (realization) of arandom point process. A random point process is called 
simple if at any time point ¢ not more than one event can occur. 


Recurrent Point Processes A random point process {71,7 ,...} is said to be recur- 
rent if its corresponding sequence of interarrival times {Y1, Y2,...} is a sequence of 
independent, identically distributed random variables. The most important recurrent 
point processes are homogeneous Poisson processess and renewal processes (sections 
7.2 and 7.3). 


Random Counting Processes Let 

Nt) = max {n, Tn < t} 
be the random number of events occurring in the interval (0, ¢]. Then the continuous- 
time stochastic process {M(A), t 20} with state space Z= {0,1,...} is called the ran- 
dom counting process belonging to the random point process {7}, T2,...}. Any count- 
ing process {N(f), t= 0} has properties 
1) NO) =0, 
2) Ms) < M(t) for s<t, 
3)For any s, ¢ withO<s<z¢, the increment Ms, f) = N(A — Ns) is equal to the num- 
ber of events which occur in (s, f]. 
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Conversely, every stochastic process {M(‘), f= 0} in continuous time having these 
three properties is the counting process of a certain random point process {7|, 7, ...}. 
Thus, from the statistical point of view, the stochastic processes 


{71,T2,...}, {Y1,Y2,...}, and {N@, t2 0} 
are equivalent. For that reason, a random point process is frequently defined as a con- 
tinuous-time stochastic process {N(f), f= 0} with properties | to 3. Note that 
Mo) = N(0, 0). 
The most important characteristic of a counting process {M(A), t= 0} is the probabil- 
ity distribution of its increments Ns, 4) = M4) — Ms), which determines for all inter- 
vals [s, f), s<t, the probabilities 
Ps. =P(Ms,D =); k=0,1,.... 


The mean numbers of events in (s, f] is 


m(s, t) = m(t) — m(s) = E(N(s, t)) = Xo kp ils, 0). (7.8) 
With P(t) = px(0, 0), 
the trend function of the counting process {M(f),t => 0} is 
m(t) = E(N(t)) = Xing Kp (t), t= 0. (7.9) 


A random counting process is called simple if the underlying point process is simple. 
Figure 7.1 shows a possible sample path of a simple random counting process. 


Note In what follows the attribute 'random' is usually omitted if it is obvious from the notation 
or the context that random point processes or random counting processes are being dealt with. 


Definition 7.1 (stationarity) A random point process {7|,7,...} is called station- 
ary if its sequence of interarrival times {Y1, Yo, ...} is strongly stationary (section 6.3, 
page 230), that is if for any sequence of integers i1,i2,...,i; with property 

l<ij <in<-++<iz, K=1,2,... 


and for any t = 0,1, 2,..., the joint distribution functions of the following two random 
vectors coincide: 


Veli shay aties Yi} and {Vise Vpsacet part: e 
Zeweeowae ae oe oe ee ewe R aes wow] ——a 
—" 
i 
in ae ee ak Gg Se oe SLO i. 
—_"—— 
eee eit rea 
| i i 
S— i } 
L ! Es >t 
fo ° t% tq ts te 


Figure 7.1 Sample path of a simple counting process 
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It is an easy exercise to show that if the sequence {Y 1, Y2,...} is strongly stationary, 
the corresponding counting process {Mf), t=0} has homogeneous increments and 
vice versa. This implies the following corollary from definition 7.1: 


Corollary A point process {71,7 ,...} is stationary if and only if its corresponding 
counting process {N(f), t= 0} has homogeneous increments. 


Therefore, the probability distribution of any increment Ms, ft) of a stationary point 
process depends only on the difference t=f-s : 


pit) =P(Ms,5+1) =k); k=0,1,..5 520, t>0. (7.10) 
Thus, for a stationary point process, 
m(t) =m(s,5+t)=m(s+t)—m(s) forall s>0, t= 0. (7.11) 


For having increasing sample paths, neither the point process {7 , T2,...} nor its cor- 
responding counting process {M(t), t= 0} can be strongly or weakly stationary as de- 
fined in section 6.3. In particular, since only simple point processes are considered, 
the sample paths of {N(d), t= 0} are step functions with jump heights equal to 1. 


Remark Sometimes it is more convenient or even necessary to define random point 
processes as doubly infinite sequences 


1% T_2, T),To, T\,T2, weds 


which tend to infinity to the left and to the right with probability 1. Then their sample 
paths are also doubly infinite sequences: {..., t_2,t1,¢9,¢1,¢2, ...} and only the incre- 
ments of the corresponding counting process over finite intervals are finite. 


Intensity of Random Point Processes For stationary point processes, the mean num- 
ber of events occurring in [0,1] is called the intensity of the process and will be de- 
noted as A. By making use of notation (7.9), 


A= m(1) = Dio kg). (7.12) 
In view of the stationarity, 4 is equal to the mean number of events occurring in any 
interval of length 1: 
XN=m(s,5+1), 520. 
The mean number of events occurring in any interval (s, ¢] of length t =¢—s is 
m(s,t)=A(t-—s) =At. 
Given a sample path {¢),¢2,...} of a stationary random point process, A is estimated 
by the number of events occurring in [0, ¢] divided by the length of this interval: 
A=an(d/t. 
In example 7.1, an estimate of the intensity of the underlying point process (assumed 
to be stationary) is A = 14/60 = 0.233. 
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In case of a nonstationary point process, the role of the constant intensity 4 is taken 
over by an intensity function X(t). This function allows to determine the mean num- 
ber of events m(s, ¢) occurring in an interval (s, f] : For any s,f withO <s <4, 


m(s,1)) = J. Mx) de. 


Specifically, the mean number of events in [0,f] is the trend function of the corre- 
sponding counting process: 


m(t)=m(0,1) =|) Ade, 120. (7.13) 
Hence, for At > 0, 

Am(t) = (0) At + o(Ad), (7.14) 
so that for small At the product A(f) At is approximately the mean number of events 
occurring in (¢,¢+ Af]. Another interpretation of (7.14) is: If At is sufficiently small, 
then A(f) Atis approximately equal to the probability of the occurrence of an event in 


the interval [t,¢+ At]. Hence, the intensity function A(f) is the arrival rate of events 
at time ¢. (For Landau's order symbol o(x), see equation (2.100), page 89.) 


Random Marked Point Processes Let {7 |, 72,...} be a random point process with 
random marks MM; assigned to the event times 7;. Then the sequence 


{(11,M1), (T2,M2), ...} (7.15) 
is called a random marked point process. Its (2-dimensional) sample paths are given 
by (7.6). The shot noise process {(7n,An); n = 1,2,...} considered in example 6.5 is 
a special marked point process. 


Random marked point processes are dealt with in full generality in Matthes et al. 
(1974); see also Stigman (1995). 


Compound Stochastic Processes Let {(7),M1), (12,Mo),...}. be a marked point 


process and {Mf), +20} be the counting process belonging to the point process 
{T,T,...}. The stochastic process {C(f), t= 0} defined by 


0 for 0<t<T 


N 
ye for t>Ty 


C(t) = 


is called a compound, cumulative, or aggregate stochastic process, and C(f) is called 
a compound random variable. According to the underlying point process, there are 
e.g. compound Poisson processes and compound renewal processes. If {T), T,...} 


is a claim arrival process and |, the size of the ith claim, then C() is the total claim 
amount in [0, 2). If 7; is the time of the ith breakdown of a machine, and M; is the 
corresponding repair cost, then C(/) is the total repair cost in [0, ¢). 
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7.2 POISSON PROCESSES 


7.2.1 Homogeneous Poisson Processes 


7.2.1.1 Definition and Properties 


In the theory of stochastic processes, and maybe even more in its applications, the 
homogeneous Poisson process is just as popular as the exponential distribution in 
probability theory. Moreover, there is a close relationship between the homogeneous 
Poisson process and the exponential distribution (theorem 7.2). 


Definition 7.2 (homogeneous Poisson process) A counting process { N(t), t>0} isa 
homogeneous Poisson process with intensity 4, > 0, if it has properties 

1) NO) =9, 

2) {N(t), t= 0} is a stochastic process with independent increments, and 

3) its increments Ms, f) = N(t)— Ms), 0<s<t, have a Poisson distribution with pa- 
rameter A(t—s): 


_ yi 
P(N(s,f) =i) = QU= Mes); i=0,1,...., (7.16) 
i! 
or, equivalently, introducing the length t = t—s of the interval [s, ¢], for all t > 0, 
P(Ms,5+7) = = OD ow ee aa | ad beer (7.17) 
e 


Formula (7.16) implies that the homogeneous Poisson process has homogeneous 
increments. Thus, the corresponding Poisson point process {T,,T2,...} is stationary 
in the sense of definition 7.1 


Theorem 7.1 A counting process {M(f), = 0} with N(0) =0 is a homogeneous Pois- 
son process with intensity A if and only if it has the following properties: 


a) {N(t), t= 0} has homogeneous and independent increments. 
b) The process is simple, i.e. P(N(t,t+h) = 2) =o(h). 
c) PN, t+h) =1)=Ah+o(A). 


Proof To prove that definition 7.2 implies properties a), b,) and), it is only neces- 
sary to show that a homogeneous Poisson process satisfies properties b) and c). 


b) The simplicity of the Poisson process easily results from (7.17): 


On) 


DI < 27h? = o(h). 


P(M(,t+h)22)=e™ ¥ Cu 2426 5 7 
i=2 : 


262 APPLIED PROBABILITY AND STOCHASTIC PROCESSES 


c) Another application of (7.17) and the simplicity of the Poisson process imply that 
P(N, t+h) = 1) = 1-P(N(t+h) =0)- P(M(4tt+h) 2 2) 
=1-e" 4 o(h) = 1-(1-Ah) + O(h) 
=Xh+o(A). 
Conversely, it needs to be shown that a stochastic process with properties a), b), and 


c) is a homogeneous Poisson process. In view of the assumed homogeneity of the 
increments, it is sufficient to prove the validity of (7.17) for s = 0. Letting 


Pi) =PNO,) =) =PNO=1); i=0,1,... 


it is to show that 


i 
(" o, i=0,1,.... (7.18) 


pih= F 


From a), 
Po(t+h) = P(N(t+ A) = 0) = PIN = 0, M(t, t+) = 0) 
= P(N(t) = 0) PUNE t+ 1) = 0) = po polh). 
In view of b) and c), this result implies 


Pot +h) = po — AA) + o(h) 
or, equivalently, 


pots) =Pold =A polt) + o(h). 


Taking the limit as / > 0 yields 
Po) =-Apol) 
Since po(0) = 1, the solution of this differential equation is 
po(t)=e*!, t20, 
so that (7.18) holds for i = 0. 


Analogously, for i= 1, 
pi(tt+h) = P(N t+h) =i) 


= P(N(t) =i, N(t+h)—N(d) = 0) + P(N(d) =i- 1, M(t+h) -—N(O) = 1) 
+d), P(M) =k, N(t+h)-N(t) =i-h). 

Because of c), the sum in the last row is o(h). Using properties a) and b), 
Pilt+h) =pi)poA)+PiiOpi) + oh) 


=P) -AA)+ p-iDaAnh + o(h), 
or, equivalently, 
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pilt+h)—pPiD _ 
h 
Taking the limit as h > 0 yields a system of linear differential equations in the p;(2): 
PiO=-ALp() Pik i= 1,2... 
Starting with po(t)=e~™, the solution (7.18) is obtained by induction. a 


A [pi(t) —pi-1 9] + off). 


The practical importance of theorem 7.1 is that the properties a), b), and c) can be 
ver- ified without any quantitative investigations, only by qualitative reasoning based 
on the physical or other nature of the process. In particular, the simplicity of the 
homo- geneous Poisson process implies that the occurrence of more than one event 
at the same time point has probability 0. 


Note Throughout this chapter, those events, which are generated by a Poisson process, will be 
called Poisson events. 


Let {71,7 ,...}_ be the point process, which belongs to the homogeneous Poisson 
process {Mt), t= 0}, Le. TZ, is the random time point at which the nth Poisson event 
occurs. The obvious relationship 


Tn <t ifand only if M2n (7.19) 
implies 
P(Tn <1) = P(NO =n). (7.20) 


Therefore, 7, has the distribution function 
co) i 
Fr (t)=P(N(t) =n) = X OO eM; n=1,2,.... (7.21) 
i=n : 


Differentiation of Fy, (¢) with respect to t yields the density of Ty: 


fr, =he MS OOK je > cg 


i, tt! 


On the right-hand side of this equation, all terms but one cancel: 


(uy ne 
=X 62 Se 2a 22 
fr, = aes #20, n= 1,2, (7.22) 
Thus, 7, has an Erlang distribution with parameters n and 4 (page 75). In particular, 
T, has an exponential distribution with parameter 1, and the interarrival (interevent) 
times Y; =7;-T;_]; i= 1,2,...; To =0, are independent and identically distributed 
as T,. Moreover, 


i cee ya ge 


These results yield the most simple and, at the same time, the most important charac- 
terization of the homogeneous Poisson process: 
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Theorem 7.2 Let {NM(4), t= 0} be a counting process and {Y1, Y,...} be the corres- 
ponding sequence of interarrival times. Then {N(A), t= 0} is a homogeneous Poisson 
process with intensity 2 if and only if the Y;, Y2,... are independent, exponentially 
with parameter A distributed random variables. a 


The random counting process {N(f), f= 0} is statistically equivalent to both its corre- 
sponding point process {7 ),7>,...} of event times and the sequence of interarrival 
times {Y1, Yo,....}. Hence, {7 , 7,...} and {Y1, Yo,...} are also called Poisson pro- 
cesses. 


Example 7.2 From previous observations it is known that the number of traffic acci- 
dents Mf) in an area over the time interval [0,) can be modeled by a homogeneous 
Poisson process {N(f), = 0}. On an average, there is one accident within 4 hours, i.e. 
the intensity of the process is A = 0.25 [h7!]. 


(1) What is the probability p of the event (time unit: hour) 
"at most one accident in [0, 10), at least two accidents in [10, 16), and no 
accident in [16, 24)"? 
This probability is 
p=P(N(10)- NO) < 1, N16) — N(10) = 2, M(24) — N(16) = 0). 


In view of the independence and the homogeneity of the increments of {M(f), t= 0}, 
p can be determined as follows: 


p = P(N(10) — N(0) < 1) P(N(16) — N(10) = 2) P(N(24) — N(16) = 0) 
= P(N(10) < 1) P(N(6) = 2) P(N(8) = 0). 


P(N(10) < 1) = P(N(10) = 0) + P(N(10) = 1) 
age 025.0 10%e 90.9873, 
P(N(6) = 2) = 1-75 — 0.25 -6 e956 = 0.4422, 
P(N(8) = 0) = e~ 9-258 = 0.1353. 
Hence, the desired probability is 
p=0.0172. 


(2) What is the probability that the second accident occurs not before 5 hours? 


Since 77, the random time to the occurrence of the second accident, has an Erlang 
distribution with parameters n = 2 and 4 = 0.25, 


P(T7 > 5)=1-F7,(5) =e (1 +. 0.25 - 5) 
so that P(7'7 > 5) = 0.6446. oO 
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The following examples make use of the hyperbolic sine and cosine functions: 


x SX x =X 
: e*-e ex+e 
sinh x= ~— > , coshx= a, x € (—00, +00), 


Example 7.3 (random telegraph signal) A random signal X(t) has structure 
X(t) = Y(-1)*%, t>0, (7.23) 
where {M(f), t20} is a homogeneous Poisson process with intensity 2 and Y is a 
binary random variable with 
P(Y=1)=P(Y=-l=1/2, 


which is independent of MA) for all ¢. Signals of this structure are called random tele- 
graph signals. Random telegraph signals are basic modules for generating signals of 
more complicated structure. Obviously, X(4)=1 or X(f)=—1, and Y determines the 
sign of X(0). Figure 7.2 shows a sample path x = x(t) of the process {X(4), t= 0} on 
condition Y= 1 and Ty, =tn; n=1,2,.... 


{X(t), t2 0} is a weakly stationary process. To see this, firstly note that 
[X()|? =1<0o forall r>0. 
Hence, {X(t), t= 0} is a second-order process. With 
MW) =(-I)™, 
its trend function is m(t) = E(X(H) = E(Y) EU(D). Since E(Y) = 0, 
m(t) = 0. 


It remains to show that the covariance function C(s,¢) of this process depends only 
on |t—s|. This requires knowledge of the probability distribution of J(2): 


A transition from /(f) =—1 to J(t)=+1 or, conversely, a transition from ((t)=+1 to 
I(t) =—1 occurs at those time points, at which Poisson events occur: 


P(U(é = 1) = P(even number of jumps in (0, ¢]) 


gee Cnr. 


2s —-Mt 
=e x Gai =e cosh At. 
x(d) 
Ip — | ae ee 
: fi ha b3 ta ts te aie 
ae: =e oe ae 


Figure 7.2 Sample path of the random telegraph signal 
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Analogously, 
P((é =—-1) = P(odd number of jumps in [0, ¢]) 


ay 0 (ar)! _ 


Qi+ Dl =e sinh At. 
i-0 (2i+ 1)! 


Hence the mean value of J(A) is 
EU] = 1-PUO = 1) +1) PUM =-1) 
=e [cosh At — sinh Af] = e724, 
Since 
C(s, t) = Cov [X(s),X()] 
= E[(X(s) XO)] = ELYU(s) YI) 
= ELY? Ks) I()] = EY?) EUs) 10) 
and E(Y*) = 1, the covariance function of {X(#), t> 0} has structure 
C(s,t) = EL(s) (0). 
In order to evaluate C(s, ft), the joint distribution of (/(s), J(4)) has to be determined: 
From (1.22), page 24, and the homogeneity of the increments of {M(4), t = 0}, assum- 


ings <t, 
Pi = PUGS) = 1, 1 = 1) = PUG) = PU = 1s) = 1) 
= e~Scosh As P(even number of jumps in (s, ¢]) 
=e~Scosh Ase") cosh A(t—s) 
=e cosh As cosh A(t—s). 
Analogously, 
Pi-1 =PU(s)=1,=-1) =e coshas sinh A(t-s), 
p-is=PUs)=-1,4Q=1) = e™ sinh As sinh A(t—s), 
p-1-1 =P(U(s) =-1,() =-1) =e™ sinh As cosh A(t-s). 
Now 
E{(s MO] = P11 + P-1,-1 -P 1-1 —P-1,1; 
so that 


C(s,t) =e 24), s<t. 
Since the roles of s and ¢ can be changed, 
C(s,) = e2AMles| | 


Hence, the random telegraph signal {X(), t= 0} is a weakly stationary process. O 
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Theorem 7.3 Let {M¢), t=0} be a homogeneous Poisson process with intensity 1. 
Then the random number of Poisson events, which occur in the interval (0, s] on con- 
dition that exactly n events occur in (0,f], s<t; i=0,1,...,1; has a binomial distri- 
bution with parameters p = s/t and n. 


Proof In view of the homogeneity and independence of the increments of the Poisson 
process {M(A), t= 0}, 


ANOS =p a2 = NOS) 


P(N(t) =n) 
_ PINs) =i, Ms, =n—i) 
a P(N(t) =n) 


(as) ors [Atos] o- Mes) 
_ P(N) =)P(NG,D=n-i) _ i! (n-i)! 


P(N) =n) Os)" o-Mt 


n! 


-(")(s) (= 8)""5 £20,190. (7.24) 


This proves the theorem. a 


7.2.1.2 Homogeneous Poisson Process and Uniform Distribution 

Theorem 7.3 implies that on condition 'M(4) = 1' the random time 7 to the first and 
only event occurring in [0,7¢] is uniformly distributed over this interval, since, from 
(7.24), for s <t, 


P(T; <s|T; <t) = P(Ms) = 1|N() = 1)= re 


This relationship between the homogeneous Poisson process and the uniform distri- 
bution is a special case of a more general result. To prove it, the joint probability 
density of the random vector (71, 7,...,Tn) is needed. 


Theorem 7.4 The joint probability density of the random vector (71, 7, ..., Tn) is 


AteMn for OX ty <tp <-++<ty 


S(t, f25 tn) “15 (7.25) 


elsewhere © 


Proof For 0 <¢, <¢z , the joint distribution function of (7,,7>) 1s 
t 
P(T, <t1, T2 <t)=[F P(T> <tlT) =D fr, Ode. 
By theorem 7.2, the interarrival times 
Y,;=7;-Tj-13 i= | a ee 


are independent, identically distributed random variables, which have an exponential 
distribution with parameter 1. 
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Hence, since 7) = Y;, 
P(T Sty, T) Sty) =|} PUT) SIT) =t) reat. 
Given 'T, =t', the random events 
'Ty Sty' and 'Yo <t)-t' 
are equivalent. Thus, the desired two-dimensional distribution function is 


F(t),t2) = P(Ty Sty, Tz <t2) =|} 1 -e MO) neat 


=A Jj (e™ -e*2) de. 


Therefore, 
F(ti,t2)=1—e*"! -Atye*”2, ty <to. 


Partial differentiation yields the corresponding two-dimensional probability density 


Me%2 for O<t) <ty 
t),fo)= F 
f@1,t2) 0 elsewhere 
The proof of the theorem is now easily completed by induction. | 


The formulation of the following theorem requires a result from the theory of ordered 
samples: Let {X1,X2,...,.Xn} be arandom sample taken from_X, i.e. the X; are in- 
dependent, identically as X distributed random variables. The corresponding ordered 
sample is denoted as 


(X7,X5,°°Xn)> 0 2G <X5 oro OF 


Given that X has a uniform distribution over [0,x], the joint probability density of 
the random vector {X},X5,....Xn} is 


a/x", OSX} <x5 << xn SX, (7.26) 
0, elsewhere. 


F(X} 5295 0%) = | 


For the sake of comparison: The joint probability density of the original (unordered) 
sample {X1,X,...,Xn} is 


1/x”, 0<x;<x, 


7.27 
0. , elsewhere . ( ) 


feeyxaont)=| 


Theorem 7.5 Let {N(t), t= 0} be a homogeneous Poisson process with intensity 1, 
and let 7; be 7 th event time; i= 1,2,...; T9 =0. Given M) =n; n=1,2,..., the ran- 
dom vector {71,72,..., Tn} has the same joint probability density as an ordered ran- 
dom sample taken from a uniform distribution over [0, f]. 
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Proof By definition, for disjoint, but otherwise arbitrary subintervals [t;,t;+;] of 
[0, ¢], the joint probability density of {7 1, 7T>,...,Tn} on condition M(t) =n is 

f(a, t9,.-5¢n |N(2) = n) 


P(t; < T; <tj+hj i=1,2,...,n|M() = n) 
= 1 . 
max(hy,h9,...,fi)—>0 hyhz---hn 


Since the event ' Mt) =n' is equivalent to Ty <t< Ty41, 
P(t; < T; < ti+h;; l= 1,2,...,n|M(A) = n) 


_ Pt <7; <tj;+h;, i= 1,2,..,n3 t< Thi) 


P(N(t) =n) 
foe) tnthy ty1th,_; t)+h, 
i) Nel Att yey dn Ane 
= t th tn-l ty 
7 At” 
(At) eM 


n! 


hyha hn @Me™ — hyho--hy 
= = n 


Qa" as e 
n!} 


Hence, the desired conditional joint probability density is 


n! 
5 O<t <tp<-+<tn St, 
f(t1.t2,-5tnINQ=n)=4 # -e : (7.28) 
0, elsewhere. 
Apart from the notation of the variables, this is the joint density (7.26). i] 


The relationship between homogeneous Poisson processes and the uniform distribu- 
tion proved in this theorem motivates the common phrase that a homogeneous 
Poisson process is a purely random process, since on condition M(t) =n, the event 
times 7, 7,...,7n are purely randomly' distributed over [0, ¢]. 


Example 7.4 (shot noise) Shot noise processes have been formally introduced in 
example 6.5 (page 229). Now an application is discussed in detail: 

In the circuit, depicted in Figure 7.3, a light source is switched on at time t=0. A 
current pulse is initiated in the circuit as soon as the cathode emits a photoelectron 


due to the light falling on it. Such a current pulse can be quantified by a function (A) 
with properties 
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=: cathode 


Figure 7.3 Photodetection circuit (Example 7.4) 


A(t)>0, h(t)=0 for t< 0 and Jj h()dt<~. (7.29) 


Let 7,7,... be the sequence of random time points, at which the cathode emits 
photoelectrons and {NM(f), t=0} be the corresponding counting process. Then the 
total current flowing in the circuit at time ¢ is 
X(t) = Ley A(t-T)). (7.30) 
In view of the properties (7.29) of h(A), X(#) can also be written in the form 
NO 
X(t) = Dj A(t-T)). 


In what follows, {N(f), t= 0} is assumed to be a homogeneous Poisson process with 
parameter 2. For determining the trend function of this shot noise {X(A), t= 0}, note 
that according to theorem 7.5, on condition 'N(4) =n', the 7, 7,.... Tn are uniform- 
ly distributed over [0, f]. Hence, 


E(h(t-T)) NW =n) =F ff, hx) dr = ; fae) de. 


Therefore, 


EXON =n) = E( LiL AO=T)|NO =n) 


= D2, Bt TINO =n) =(4 ff noyar) a. 
The total probability rule (1.7) yields 
E(X(t)) = Uno E(XO)|N(O = 2) PIN) = 1) 


Lt 2 (AN 
= 7 Jo h@ax B ne Mt 


= (15) mae) Beno) = (4 ff Aeodae) Od). 


7 RANDOM POINT PROCESSES 271 


Therefore, the trend function of this shot noise process is 
m(t) =. [0 h(x) dx. (7.31) 


In order to obtain its covariance variance function, the mean value E(X(s) X(f)) has to 
be determined: 


E(X(s)X(t)) = Dj j-1 ElA(s — T;) h(t- T;)] 
= Li Es - 7) A(t-T;)) 


+ ¥% hs-T)kt-T)]. 
ij=l,i¢j 


Since, on condition 'N(f) =n', the 71, 7,..., Tn are uniformly distributed over [0, ¢], 

E(A(s — T;) h(t - T;)|N@) =n) = i i h(s —x)h(t—x) dx. 
For s <t¢, 

E(A(s — T;) h(t - T;)|N@ = 1) = 7 i h(x) h(t—s +x) dx. 
By theorem 7.5, on condition' Mt) =n' the 7, T2,...,7n are independent. Hence, 

E(h(s — Tj) h(t T;)|N@ = n) = E(h(s - T))|NO = 1) E(A(t- T))|N@ =n) 
=(4J5.ne-syax) (5 Jf nea) 
=(F J, nena) (7 Jy nea). 
Thus, for s < ¢, 
E(Xs) XOINO =n) =(4J¢ noone —s +x) de) n 
+ (48 agar) (4 ff nepax)(n- Dn. 
Applying once more the total probability rule, 
BUMS) XO) = (4 J§ ACQAE-9 +) dx) BVO) 


+ (458 neyar) (4 ff, acy ax) E0200) - EV) J. 
Making use of equations (7.31) and (6.4), page 226, as well as 
E(Mt)) =2t and E(N?(t)) =At(At+ 1), 
yields the covariance function: 


C(s,) =A A@)At-s+x)dx, 8 <t. 


272 APPLIED PROBABILITY AND STOCHASTIC PROCESSES 


More generally, for any s and ¢, C(s, f) can be written in the form 
Cs.) =f? h(x) h(\t—-s| +x) dx. (7.32) 


Letting s =¢ yields the variance of X(f) : 
Var(X(t)) =f n(x) dx. 


If s tends to infinity in such a way that |t| = t—s stays constant, trend and covariance 
function become 


m=AJ) A(x)de, 
C(t) =A Jp Ax) At] +x) de. (7.33) 


These two formulas are known as Cambell's theorem. They imply that, for large ¢, 
the shot noise process {X(4), 20} is approximately weakly stationary. For more 
general formulations of this theorem see Brandt et. al. (1990) and Stigman (1995). 

If the current impulses induced by photoelectrons have random intensities A;, then 
the total current flowing in the circuit at time ¢ is 


XO= 5) MiGETy. 


If the A; are identically distributed as A with E(A2) <0, independent of each other, 
and independent of all 7;,, then determining trend and covariance function of this 
generalized shot noise {X(), = 0} does not give rise to principally new problems: 


m(t) =2E(A)|, h(x) de, (7.34) 
C(s,t) = REA h(x) h([t—s| +x) dx. (7.35) 


If the process of inducing current impulses by photoelectrons has already been oper- 
ating for an unboundedly long time (the circuit was switched on a sufficiently long 
time ago), then the underlying shot noise process {X(A), tf € (—00,+00)} is given by 


X(t) =D A; h(t-T;). 
In this case the process is a priori stationary. O 


Example 7.5 Customers arrive at a service station (service system, queueing system) 
according to a homogeneous Poisson process {N(f), f= 0} with intensity 7. Hence, 
the arrival of a customer is a Poisson event. The number of servers in the system is as- 
sumed to be so large that an incoming customer always will find an available server. 
Therefore, the service system can be modeled as having an infinite number of servers. 
The service times of all customers are assumed to be independent random variables, 
which are identically distributed as Z. 
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Let G(4) = P(Z < t) be the distribution function of Z, and X(t) be the random number 
of customers in the system at time ¢, X(0) = 0. The aim is to determine the state prob- 
abilities p;(t) of the system: 

pPiO=PXAO=); i=0,1,..5 620. 
A customer arriving at time x is still in the system at time ¢, ¢>.x, with probability 


1 — G(t—x), i.e. its service has not yet been finished by ¢. Given M(t) =n, the arrival 
times 7), 7 ,..., Tn of the n customers in the system are, by theorem 7.4, independent 


and uniformly distributed over [0, ¢]. For calculating the state probabilities, the order 
of the 7; is not relevant. Thus, the probability that any of the 7 customers, who arriv- 
ed in [0, ¢], is still in the system at time f is 


p(t) = [i - Gt—a) F de =F [i - Geo). 
Since, by assumption, the service times are independent of each other, 
POW =i1NO =n) =(") POMEL poy" 1=0,1,.40. 
By the total probability rule (1.24), 


pi(t) = z P(X(0) = i|N(0) =n) P(N) =n) 


n 


This is a mixture of binomial distributions with regard to a Poisson structure distribu- 
tion. Thus, from example 2.24, page 93, if there the parameter A is replaced with A, 
the state probabilities of the system are 


i 
pai) = POL a,j 20,1,... 


Hence, X(t) has a Poisson distribution with parameter 
E(X() = kt p(t) 
so that the trend function of {X(f), t= 0} becomes 


m(t)=2Jj1-Ga)dx, 120. 


For t > © the trend function tends to 


: _ EZ) 

lim m(t) = EW)’ (7.36) 
where E(Y) = 1/A is the mean interarrival time and E(Z) the mean service time of a 
customer: 


E(Z) =| (1 - G(x)) dx. 
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By letting p = E(Z)/E(Y), the stationary state probabilities of the system become 
i 
pi=lim p= Fe; §=0,1,.... (7.37) 
t>00 U 


If Z has an exponential distribution with parameter p, then 
big it is as 
m(t) = Ao e dx = me -eH), 


In this case, p = A/pL. a) 


7.2.2 Nonhomogeneous Poisson Processses 


In this section a stochastic process is investigated, which, except for the homogeneity 
of its increments, has all the other properties listed in theorem 7.1. Abandoning the 
assumption of homogeneous increments implies that a time-dependent intensity func- 
tion A= A(t) takes over the role of 7. This leads to the concept of a nonhomogene- 
ous Poisson process. As proposed in section 7.1, the following notation will be used: 


Ns, t)=MO-Ms), OS 5<t. 

Definition 7.3 A counting process {M(t), t>0} with M(0)=0 is called a nonhomo- 
geneous Poisson process with intensity function X(t) if it has properties 
(1) {N@, t= 0} has independent increments, 
(2) P(N(t,t+h) = 2) = o(h), 
(3) P(N t+h) = 1) =ADA+ O(A). e 
Three problems will be considered: 
1) Computation of the probability distribution of its increments M(s, 2): 

Dis, = PMs, =1); O<s<t, i=0,1,.... 


2) Computation of the probability density of the random event time 7; (time point at 
which the i-th Poisson event occurs). 


3) Computation of the joint probability density of (7), 7,...,Tn); n=1,2,.... 
1) In view of the assumed independence of the increments, for / > 0, 
Pols,t+h) = P(Ms,t+h) =0) 
= P(Ms, t) = 0, Mt, t+h) =0) 
= P(Ms, t) = 0) - PLN(t, t+ A) = 0) 
= pols, OLL-AMA+ o(A)]. 
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Thus, 


Pols, t+ h) —Ppols, 0) = At) pols, t) + o(h) ; 
h h 

Letting h — 0 yields a partial differential equation of the first order: 

£ pols.t) = AO) pols0). 
Since M(0) =0 or, equivalently, po(0,0) = 1, the solution is 

Pols,t) =e HOA), (7.38) 
where 

A(x) = [5 Mw) du; x20. (7.39) 


Starting with po(s,t), the probabilities p;(s,t) for i= 1 can be determined by induc- 
tion: 


A()-A(s)]! 
pils,t) = OMT eA; i=0,1,2,.... (7.40) 
i! 
In particular, the absolute state probabilities 


P(t) = pi(0,t) = P(N) = 1) 


of the nonhomogeneous Poisson process at time ¢ are 


A i 
(AON v0, 


pi=—— i=0,1,2,.... (7.41) 


Hence, the mean number of Poisson events m(s, f) = E(Ms,)) occurring in the inter- 
val (s, ft], s<t, is 

m(s,t) = A(t) — A(s) = A(x) de. (7.42) 
In particular, the trend function m(f) = m(0,t) of {M(), t= 0} is 


m(1) = A(t) =f, A@x)dx, £20. 


2) Let F7,(Q = P(T <2) be the distribution function and f7,(4) the probability den- 


sity of the random time 7, to the occurrence of the first Poisson event. Then 


Pot) =po00,t) =P(T1 >) =1-Fr,. 
From (7.38), 
po =e“. 
Hence, 
Fr,Q=1-e4®, fr Q=radeA, +20. (7.43) 
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A comparison of (7.43) with formula (2.98) (page 88) shows that the intensity func- 
tion A(t) of the nonhomogeneous Poisson process {Mt), t = 0} is identical to the fail- 
ure rate belonging to 7,. Since 


Fr (0 =PTn st) =P(N(O 2 n), 


the distribution function of the 7 th event time Ty, is 
Fr,()= 3 [wr eAO, n=1,2,.... (7.44) 


Differentiation with respect to eee the probability density of 7, : 


fri) = LOT nie “AO, 420, n=1,2,.... (7.45) 
Equivalently, 
n-1 
fr,O= fOr fr, (0; n=, 2yanrs 


By formula (2.52), page 64, and formula (7.44), the mean value of Ty is 
E(Tn) = Je oS oy AOL war’) dt. (7.46) 


Hence, the mean time 
E(¥n) = E(Tn) - E(Tn-1) 
between the (n — 1) th and the n th event is 


E(Yn) = Jo [A@]" 1 eA dt; n=1,2,. (7.47) 


(n-1! “Ti 
Letting A() =A and A(A=Azt yields the corresponding characteristics for the 
homogeneous Poisson process, in particular E(Yn) = 1/2. 


3) The conditional probability P(72 < t2|T) =t,) is equal to the probability that at 
least one Poisson event occurs in (¢1,¢2], ft; <t2. Thus, from (7.40), 

F7,(tal ty) = 1—-po(t,t2) = 1-e AOA, (7.48) 
Differentiation with respect to ty yields the corresponding probability density: 

fr (tz|t) =Mtze AOA], Ost <tr. 
By (3.19), page 128, the joint probability density of (T,,72) is 


Mt1) fr, (t2) for 4) <tz 
elsewhere © 


f(ti,t2) = le 
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Starting with f(t), t2) , one inductively obtains the joint density of (71, T,..., Tn): 


Mt )Mta) Minar (tn) for OS t <tp << ty, 


0, elsewhere. we?) 


S (ty, to,.5tn) = | 
This result includes as a special case formula (7.25). 


As with the homogeneous Poisson process, the nonhomogeneous Poisson counting 
process {NMf), t= 0}, the corresponding point process {T),T>,...} of Poisson event 
times, and the sequence of interevent times {Y 1, Yo,...} are statistically equivalent 
stochastic processes. 


A 
60- 


0 | | | | | a stig 
) 6 7 8 9 10 11 


Figure 7.4 Intensity of the arrival of cars at a filling station 


Example 7.6 From historical observations it is known that the number of cars arriv- 
ing for petrol at a particular filling station weekdays between 5:00 and 11:00 a.m. 
can be modeled by a nonhomogeneous Poisson process {Mt),¢ 20} with intensity 
function (Figure 7.4) 


A(t) = 10+35.4(t-S)e SVB 5 <¢ <1. 


1) What is the mean number of cars arriving for petrol weekdays between 5:00 and 
11:00? According to (7.42), this mean number is 


B(NGS,11)) = J$! mar =f (1043542678) a 


=| 101- 1416678 |" = 200. 


2) What is the probability that at least 90 cars arrive for petrol weekdays between 
6:00 and 8:00? The mean number of cars arriving between 6:00 and 8:00 is 


Je a) dt = [7104.35.40 dt 


99. 


=| 101-1416 erry, = 
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Hence, the random number of cars N(6,8) = M(8)— M6) arriving between 6:00 and 
8:00 has a Poisson distribution with parameter 99 so that the desired probability is 


P(N(6,8)>90) = % 22" 60.99, 
n=90 NM: 


By using the normal approximation to the Poisson distribution (page 213): 


5 99" 20.99 1 -0{ 0-29) w 10,1827. 
n=99 7! {99 
Therefore, 

P(N(6,8) > 90) = 0.8173. Oo 


7.2.3 Mixed Poisson Processes 


Mixed Poisson processes had been introduced by J. Dubourdieu (1938) for modeling 
claim number processes in accident and sickness insurance. In view of their flexibili- 
ty, they are now a favorite point process model for many other applications. A recent 
monograph on mixed Poisson processes is Grandell (1997). 

Let {M(t), 20} be a homogeneous Poisson process with intensity 1. To explicitly 
express the dependence of this process on A, in this section the notation {N(4), t= 0} 
for the process {N(t), f= 0} is adopted. The basic idea of Dubourdieu was to consid- 
er A a realization of a positive random variable L, which is called the (random) struc- 
ture or mixing parameter. Correspondingly, the probability distribution of L is called 
the structure or mixing distribution (section 2.4, pages 92 and 94). 


Definition 7.4 Let ZL be a positive random variable with range R;. Then the count- 
ing process {N7(#), t= 0} is said to be a mixed Poisson process with structure param- 
eter L if it has the following properties: 


(1) {N7j7;-.(@, ¢2 0} has independent, homogeneous increments for all 1 € R;. 


i 
(2) PINz |r. =) = oye for allA € R;, i=0,1,.... e 


Thus, on condition ZL =A, the mixed Poisson process is a homogeneous Poisson 
process with parameter A: 


{Nr \z-,(), ¢2 0} = {NaAC, 2 Of. 
The absolute state probabilities p;(t)= P(Nz(¢) = i) of the mixed Poisson process at 


time ¢ are 


PI) =9 =H EP et"); i=0,1,.... (7.50) 


i! 


7 RANDOM POINT PROCESSES 279 
If L is a discrete random variable with P(L =A,)=1,; k=0,1,...;. then 
PIN,(t) =i) = SO tg, (7.51) 
In applications, a binary structure parameter L is particularly important. In this case, 
P(N (t) =i) = ne ul et + Ga" e2t (1 —m) (7.52) 
forO<m<1, Ay #Ad. 
The basic results, obtained in what follows, do not depend on the probability distri- 


bution of L. Hence, for convenience, throughout this section the assumption is made 
that Z is a continuous random variable with density f7(A). Then, 


fe di 


pilt)= Lr et pAjdr; i=0,1,... 


Obviously, the probability po(t)= P(N,(f) = 0) is the Laplace transform of f7(A) 
with parameter s = ¢ (page 99): 


pol) = fr) = Ble*) = |p eM Ayda. 
The i th derivative of po(f) is 


FPO — oy = [Gaye far. 


Therefore, all state probabilities of a mixed Poisson process can be written in terms 
of po(t): 


pi(t) = P(N (d) =i) = (-1)! Lp: i=1,2,.... (7.53) 
Mean value and variance of N;(t) are (compare with the parameters of the mixed 
Poisson distribution given by formulas (2.108), page 94): 
E(N,() =tE(L), Var (Nz (t)) =tE(L)+t?Var(L). (7.54) 
The following theorem lists two important properties of mixed Poisson processes. 


Theorem 7.6 (1) A mixed Poisson process {N7z(f), t= 0} has homogeneous incre- 
ments. 


(2) If L is not a constant (i.e. the structure distribution is not degenerate), then the 
increments of the mixed Poisson process {N;(4), £2 0} are not independent. 


Proof (1) Let 0=t9 <t) <---<ty; n=1,2,.... Then, for any nonnegative integers 
E1512, 005 in, 
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PIN (ty +7, th +7) = ips A= 1,2,...,0) 
=o PIN tt e+) =i k= 1,2,..,MfL(Ada 
=o PIVatAL ti) = igs K=1,2,...mfr(A)dd 
= P(N, (te-1, th) = ig k= 1,2,...57). 
(2) Let 0 <t, <t) <t3. Then, 
PIN (t1,t2) =i), Ni (t2,t3) = i2) 
=| 5 PINAL. t2) =i, Na(ta,ts) = in) fi(aar 
= Jy PONa(t1t2) = 11) PUNalta,t3) = fa) fL@) dd 
#9 PINA(t1.t2) =A FLA)dd Jy PUN (t2,13) = in) f(A) dd 
= P(Nz(t1,t2) = 11) P(N (2,3) = i2). 


This proves the theorem if the mixing parameter L is a continuous random variable. 
If Z is discrete, the same pattern applies. a 


Multinomial Criterion Let 0 = to <t, <---<tn; n=1,2,.... Then, for any nonneg- 
ative integers 11,i7,...,in Withi=i, tin +---+in, 


PINT (tg 1> th) = igs k _ = 1,2,...,n|Nz(tn) = i) 


se) et) ee) 7.55 
~ipligh++in! \tn th tn ae 


Interestingly, this conditional probability does not depend on the structure distribu- 
tion (compare to theorem 7.5). Although the derivation of the multinomial criterion 
is elementary, it is not done here (Exercise 7.17). 


As an application of the multinomial criterion (7.55), the joint distribution of the in- 
crements NV;(0,t)=N z(t) and N;(¢,t+7) will be derived: 


P(N, (t) =i, N,(t,t+1) =) 
= P(N, (t)= il N,(t+1) =i +h PIN, (t+1) =i+h 


(i+k)! : TA(t+ ta Cie 
7 ie (4) (4) ease Toe MHD FAA, 


(i+k)! 


Hence, the joint distribution is for i,k = 0, 1,..., 


bk? pops 
P(N,(0,t) =i, Ny (t+) = b= ES 1 NK AD A A)dK. (7.56) 
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Since a mixed Poisson process has dependent increments, it is important to get infor- 
mation on the nature and strength of the statistical dependence between two neigh- 
boring increments. As a first step into this direction, the mean value of the product of 
the increments N;(t)= N,(0,4) and N,(¢,t+7) has to be determined. From for- 


mula (7.56), 
ENO IN(4t+D) =D x noe [EAM MHD OAR 


Soe it wey Az ettedte MHD FAdA 


=tt [>a fi(aydr 
so that 
E({(Nz(0)] [Ni (t,t+0)]) = tt E(L?). (7.57) 


Hence, in view of formula (6.4), page 226, 

Cov(Nz(t), Ni (t,t +8) =tt Var(L). 
Thus, two neighboring increments of a mixed Poisson process are positively corre- 
lated. Consequently, a large number of events in an interval will on average induce a 
large number of events in the following interval (‘large' relative to the respective 
lengths of these intervals). This property of a stochastic process is also called posi- 
tive contagion. 


Polya Process A mixed Poisson process with a gamma distributed structure parame- 
ter L is called a Pélya process (or Pélya-Lundberg process). 


Let the gamma density of L be 
fils) = P—~ Ao! eB A>0, a>0, B>0 
T(a) > > >’ ” 
Then, proceeding as in example 2.24 (page 95) yields 


i a 
(A t) eAt T@ qO-L ee BA dn 


PWLO=)=]o 


_TG+a) pe 
Ta) B+Are 


Hence, 


ele t ed B ve 1=03 1 48 (7.58) 


Pvt) =i) =( i (Bet) (Bee) ? 


282 APPLIED PROPABILITY AND STOCHASTIC PROCESSES 


Thus, the one-dimensional distribution of the Polya process { N; (4), t= 0} is a nega- 
tive binomial distribution with parameters r=a and p=t/(B+2). In particular, for 
an exponential structure distribution (a = 1), Nz(¢) has a geometric distribution with 
parameter p =t/(t+f). 


To determine the m-dimensional distribution of the Pélya process the multinomial cri- 
terion (7.55) and the absolute state distribution (7.58) are used: 


For 0 = to <t) <-++<tn; n=1,2,...and ig =0, 
PUNE) aie k= 2.4) 
= P(N, (ty) = ig; k= 1,2, ...,n|Nz(tn) = in) P(N (tn) = in) 
= PIN, (te-1, th) = ig — igs k= 1,2,...,.nINz(tn) = in) P(NL(tn) = in) 
int ae ee la ara tris | BLS 
\ 


"Ge Tepk ery a in J\B+tn) \B+tr/ 


After some algebra, the n-dimensional distribution of the Polya process becomes 
PWN (ty) = is k= 1,2,...,7) 
4 mle. a. tp —ty., \ ike 
in! (in-1+a\(_ 8B ) nee kl ) . (7.59) 


= Ge=tea in Bt+tn/ ja \ B+tn 


For the following three reasons its is not surprising that the Pélya process is increas- 
ingly used for modeling real-life point processes, in particular customer flows: 

1) The finite dimensional distributions of this process are explicitly available. 

2) Dependent increments occur more frequently than independent ones. 


3) The two free parameters a and B of this process allow its adaptation to a wide var- 
iety of data sets. 


Example 7.7 An insurance company analyzed the incoming flow of claims and found 
that the arrival intensity 7 is subject to random fluctuations, which can be modeled 
by the probability density f, (4) of a gamma distributed random variable L with mean 
value E(L) = 0.24 and variance Var(L) = 0.16 (unit: working hour). The parameters 
a and 8 of this gamma distribution are obtained from 

E(L) =0.24=a/B, Var(L) = 0.16 = a/?. 


Hence, a = 0.36 and B = 1.5. Thus, LZ has density 


F0)= (1:5) 9° 4.70.64 (SA 459 
- (0.36) : 
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In time intervals, in which the arrival rate was nearly constant, the flow of claims be- 
haved like a homogeneous Poisson process. Hence, the insurance company modeled 
the incoming flow of claims by a Pélya process { N,(t), t=0} with the one-dimen- 
sional probability distribution 


P(N (1) =i) = (7-064) (4 )' (4S) er ilies: 


By (7.54), mean value and variance of N7(¢) are 
E(N,(t)) =0.24t, Var(N,(t)) = 0.24t+0.16t7. 


As illustrated by this example, the Polya process (as any other mixed Poisson process) 
is a more appropriate model than a homogeneous Poisson process with intensity 
X. = E(L) for fitting claim number developments, which exhibit an increasing variabi- 
lity with increasing ¢. oO 


Doubly Stochastic Poisson Process The mixed Poisson process generalizes the 
homogeneous Poisson process by replacing its parameter A with a random variable 
L. The corresponding generalization of the nonhomogeneous Poisson process leads 
to the concept of a doubly stochastic Poisson process. A doubly stochastic Poisson 
process {Nj,.)(t), t20} can be thought of as a nonhomogeneous Poisson process 


the intensity function A(4) of which has been replaced with a stochastic process 
{L(t), t= 0} called intensity process. Thus, a sample path of a doubly stochastic Pois- 
process {N7 .)(¢), £2 0} can be generated as follows: 


1) A sample path {A(A), t= 0} of a given intensity process {L(), t= 0} is simulated 
according to the probability distribution of {L(A), t= 0}. 


2) Given {A(‘), ¢20}, the process {N7(.)(d), ¢2 0} evolves like a nonhomogeneous 
Poisson process with intensity function A(f). 


Thus, a doubly stochastic Poisson process {Nj .)(¢), ¢2 0} is generated by two inde- 
pendent 'stochastic mechanisms'. 


The absolute state probabilities of the doubly stochastic Poisson process at time ¢ are 
af 9 Weal eed i -[jL@dr). 
PW) =i) = AL L(x) dx | e0 Js F=Os (7.60) 


In this formula, the mean value operation '£' eliminates the randomness generated by 
the intensity process in [0, ¢]. 


The trend function of {N7,.)(), ¢2 0} is 
m(t) = E( fi, Lo) dx) =, LOO) de, 120. 


A nonhomogeneous Poisson process with intensity function A(f) = E(L()) can be 
used as an approximation to the doubly stochastic Poisson process {Nj .)(0), ¢2 0}. 
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The doubly stochastic Poisson process becomes 
1. the homogeneous Poisson process if L(t) is equal to a constant A for all ¢> 0, 
2. the nonhomogeneous process if L(t) is a deterministic function A(A), t= 0, 


3. the mixed Poisson process if L(f) is a random variable L, which does not depend 
ont. 


The two ‘degrees of freedom’, a doubly stochastic Poisson process has, make this pro- 
cess a universal point process model. The term ‘doubly stochastic Poisson process' 
was introduced by R. Cox, who was the first to investigate this class of point proces- 
ses. Hence, these processes are also called Cox processes. For detailed treatments 
and applications in engineering, insurance, and in other fields see Snyder (1975) and 
Grandell (1997). 


7.2.4 Superposition and Thinning of Poisson Processes 


7.2.4.1 Superposition 


Assume that a service station recruits its customers from n independent sources. For 
instance, a branch of a bank serves customers from n different towns, or a car work- 
shop repairs and maintains n different makes of cars, or the service station is a water- 
ing place in a game reserve, which is visited by n different species of animals. Each 
town, each make of cars, and each species generates its own arrival process. Let 


{N,(t), £20}; i=1,2,...,7, 

be the corresponding counting processes. Then, the total number of customers arriv- 
ing at the service station in [0, ¢] is 

NO=N{O+N204+---+Nn(). 
{N(t), t= 0} can be thought of as the counting process of a marked point process, 
where the marks indicate from which source the customers come. 
On condition that {N;(), 20} is a homogeneous Poisson process with parameter 
Xj; 1=1,2,...,n, what type of counting process is { Mt), t= 0}? 
From example 4.18 (page 180) it is known that the z-transform of M(A) is 

Mn) = e Ai thgt stan) t(Z-D 
Therefore, M(t) has a Poisson distribution with parameter 

(Ay tAgt-::+An)t. 


Since the counting processes {N,(t), t= 0} have homogeneous and independent incre- 
ments, their additive superposition {N(t), t= 0} also has homogeneous and independ- 
ent increments. This proves the following theorem. 
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Theorem 7.7 The additive superposition {N(f), f= 0} of n independent, homogen- 
eous Poisson processes {N,(f), f=>0} with intensities 1;; i=1,2,...,2; is a homo- 
geneous Poisson process with intensity 

N=Ayp+AqQt---+An. a 


Quite analogously, if {N;(4, ¢20} are independent nonhomogeneous Poisson pro- 
cesses with intensity functions 4,(f); i= 1,2,...,1; then their additive superposition 
{M(t), f= 0} is a nonhomogeneous Poisson process with intensity function 


Mt) =2AyO+AgH +--+ An. 


7.2.4.2 Thinning 
There are many situations, in which not superposition, but the opposite operation, 
namely thinning or splitting, of a Poisson process occurs. For instance, a cosmic par- 
ticle counter registers only a-particles and ignores other types of particles, a reinsur- 
ance company is only interested in claims, the size of which exceeds, say, one million 
dollars, or a game ranger counts only the number of rhinos, which arrive at a water- 
ing place per day. Formally, a marked point process {(71, M1), (T2,M2),...} arrives 
and only events with special marks will be taken into account. It is assumed that the 
marks M; are independent of each other and independent of {7),7,...}, and that 
they are identically distributed as 
uai™! with probability 1—p 
~ [mz with probability — p 


5) 


i.e., the mark space only consists of two elements: M = {m ,,m }. In this case, there 
are two different types of Poisson events: type 1-events (attached with mark m ) and 
type 2-events (attached with mark m). 

Of what kind is the arising point process if only type 1-events are counted? 


Let Y be the first event time with mark m2. If t< 7, then there is surely no type 2- 
event in [0,f], and if 7, <t<T,,,, then there are exactly n events in [0,f] and 
(1 —p)” is the probability that none of them is a type 2-event. Hence, 


P(Y>t)=P(0<t<T})+ Dp P(Tn St < Ta) (1—p)". 
Since P(Tn St < Ty41) = P(N =n), 


ag SF (OO we ae 
P(Y>th=e +3 ane Na p)" 


— cht 4 gM y ep) =e 4 eM eMI-p)t 1]. 
n=1 : 


Hence, 
P(Y>th=e?!, 120. (7.61) 
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Hence, the interevent times between type 2-events have an exponential distribution 
with parameter pA. Moreover, in view of our assumptions, these interevent times are 
independent. By changing the roles of type 1- and type 2-events, theorem 7.2 implies 
theorem 7.8: 


Theorem 7.8 Consider a homogeneous Poisson process {M(A), t= 0} with intensity 
i and two types of Poisson events 1 and 2, which occur independently with respec- 
tive probabilities 1 —p and p. Then M(t) can be represented in the form 

NO =NiO+N2(0), 
where {NV (0), £20} and {No(4), t= 0} are two independent homogeneous Poisson 
processes with (1 —p)A and pA, which count only type 1- and type 2-events, respec- 
tively. | 


From this theorem one obtains by induction the following corollary, which is the ana- 
logue to theorem 7.7: 


Corollary Let {(7),M,),(T2,Mp),...} be a marked point process with the marks M; 
being independent of each other and identically distributed as M: 

P(M=m,j)=pj; i=1,2,..0, Una pi=l. 
The underlying point process {7, 7, ...} 1s assumed to be Poisson with intensity A. 


If only events with mark m; are counted, then the arising point process is a Poisson 
process with intensity Ap;,i=1,2,...,n. 


Nonhomogeneous Poisson Process Now the situation is partially generalized by 
assuming that the underlying counting process {M(f),¢20} is a nonhomogeneous 
Poisson process with intensity function A(4). The ith Poisson event occurring at time 
T; comes with a random mark M;, where the {M,,Mp,...} are independent and 
have the following probability distribution: 

wal ™ with probability 1 — p(d) 


ie iven that T;=t; i=1,2,... 
i=) ins withprobability. p(y BVT MANS ER he PS ss 


Note that the M; are no longer identically distributed. Again, an event coming with 
mark m; is called a type i-event, i= 1,2. 


Let Y be the time to the first occurrence of a type 2- event, G(t)= P(Y < 0) its distri- 
bution function, and G(t) = 1 — G(#). Then the relationship 


P(t< Yst+AtlY> 2 =p(d)(D At+ o(Ad) 
implies 


Letting Af tend to 0 yields 
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G'(t 
oa = p(t) M0). 
By integration, 
Gy =e Tormarar 5 9, (7.62) 


If p(x) =p, then (7.62) becomes (7.61). 


Theorem 7.9 Given a nonhomogeneous Poisson process {M(),t 20} with intensity 
function A(¢) and two types of events | and 2, which occur independently with respec- 
tive probabilities 1 — p(t) and p(f) if tis an event time. Then M(t) can be represented 
in the form 


NOQ=NiO) + N20), 


where {N)(f),t=0} and {N>(f),t=0} are independent nonhomogeneous Poisson 
processes with intensity functions (1 —p(4))A(4) and p(f)A(H, which count only type 
1- or type 2-events, respectively. | 


7.2.5 Compound Poisson Processes 


Let {(7;,M;); i= 1,2,...} be amarked point process, where {7;; i= 1,2,...} is a Pois- 
son point process with corresponding counting process {N(d), t= 0}. Then the stoch- 
astic process {C(f), t= 0} defined by 


NO) 
C(t) = x M; 
i= 


with Mo =0 is called a compound (cumulative, aggregate) Poisson process. 
Compound Poisson processes occur in many situations: 


1) If 7; is the time point at which the 7 th customer arrives at an insurance company 
and M,; is its claim size, then C(A) is the total claim amount the company is confronted 
with in the time interval [0, ¢]. 


2) If T; is the time of the ith breakdown of a machine and MM; the corresponding re- 
pair cost, then C(A) is the total repair cost in [0, ¢]. 


3) If 7; is the time point the ith shock occurs and M; the amount of (mechanical) 
wear, which this shock contributes to the degree of wear of an item, then C(A) is the 
total wear the item has experienced up to time ¢. (For the brake discs of a car, every 
application of the brakes is a shock, which increases their degree of mechanical wear. 
For the tires of the undercarriage of an aircraft, every takeoff and every touchdown is 
a shock, which diminishes their tread depth.) 
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In what follows, {N(A), t= 0} is assumed to be a homogeneous Poisson process with 
intensity 1. If the M; are independent and identically distributed as M and independ- 
ent of {71,7 ,...}, then {C(0, t= 0} has the following properties: 

1) {C(), t= 0} has independent and homogeneous increments. 


2) The Laplace transform of C(#) is 


Cis) = eh LMO-11, (7.63) 
where Ms) = E(e™) 


is the Laplace transform of M. The proof of (7.63) is straightforward: By (2.118) at 
page 99, 


Cis) 7 Be cw) = Bes (Mo+My+Mat--+Mag ) 


= > Be (Mo+M | +M>+- “+My ) PIN(D) = n) 
n=0 
ce) n ca) 7 
2 -sM\" QO" ort _ oat ALM) 1" | 
7 3 tle : ) nt © =e > n! 
— ght M(s)-1] | 
From C (s), all the moments of C(#) can be obtained by making use of (2.119). In 
particular, mean value and variance of C(t) are 


E(C(t))=2tE(M), Var(C(t)) =.tE(M2). (7.64) 


Hint These formulas can also be derived by formulas (4.74) and (4.75), page 194. 


Now the compound Poisson process is considered on condition that M has a Bernoulli 
distribution: 


7 1 with probability p 
~ [0 with probability 1—p ~ 


Then M, +Mz+---+Mn as asum of independent and Bernoulli distributed random 
variables is binomially distributed with parameters n and p (page 49). Hence, 


P(C(t) =k) = Xo) P(Mo + M1 ++ + Mn = KINO) = 2) P(N() = 2) 
a a EOD = 
== (t)pka-p) fees 


This is a mixture of binomial distributions with regard to a Poisson structure distribu- 
tion. Hence, by example 2.24 (page 93), C(¢) has a Poisson distribution with parame- 
terApt: 


n 
P(C(t) =k) = CeO ober, k=0,1,.... 
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Corollary If the marks of a compound Poisson process {C(#), = 0} have a Bernoulli 
distribution with parameter p, then {C(t), 20} arises by thinning a homogeneous 
Poisson process with parameter A. 


If the underlying counting process {N(t), t= 0} is a nonhomogeneous Poisson process 
with intensity function A(f) and integrated intensity function A(¢) = Io (x) dx, then 
(7.63) and (7.64) become in this order 


Cs) = eAOLM)-1) 
E(C(t)) = A(t) EM), (7.65) 
Var(C(t)) = A(t) E(M). 


Again, these formulas are an immediate consequence of (4.74) and (4.75). 


7.2.6 Applications to Maintenance 


The nonhomogeneous Poisson process is an important mathematical tool for model- 
ing and optimizing the maintenance of technical systems with respect to cost and reli- 
ability criteria by applying proper maintenance policies (strategies). Maintenance 
policies prescribe when to carry out (preventive) repairs, replacements, inspections, 
or other maintenance measures. Repairs after system failures usually only tackle the 
causes which triggered off the failures. A minimal repair performed after a failure 
enables the system to continue its work but it does not affect the failure rate (2.56) 
(page 88) of the system. In other words, after a minimal repair the failure rate of the 
system has the same value as immediately before a failure. For example, if a failure 
of a complicated electronic system is caused by a defective plug and socket connec- 
tion, then removing this cause of failure can be considered a minimal repair. Preven- 
tive replacements (renewals) and preventive repairs are not initiated by system fail- 
ures, but they are carried out to prevent or at least to postpone future failures. Preven- 
tive minimal repairs make no sense with regard to the survival probability of systems. 


Minimal Repair Policy Every system failure is (and can be) removed by a minimal 
repair. 


Henceforth it is assumed that all renewals and repairs take only negligibly small times 
and that, after completing a renewal or a repair, the system immediately resumes its 
work. The random lifetime T= 7, of the system has probability density f(A), distribu- 
tion function F(¢), survival probability F(t) = 1 — F(t), and failure rate A(7). 


Theorem 7.10 A system is subject to a minimal repair policy. Let 7; be the time at 
which its ith failure (minimal repair) takes place. Then the sequence {7|,7>,...} isa 
nonhomogeneous Poisson process, the intensity function of which is given by the 
failure rate A(t) of the system. 
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Proof The first failure of the system, which starts working at time ¢ = 0, occurs at the 
random time 7= 7 with density 


fr Q=Me*®; 20. 


The same density one gets from (7.45) or (7.49) for n = 1. Now let us assume that a 
failure (minimal repair) occurs at time point 7; =¢,. Then the failure probability of 
the system in [¢,,¢2) with t, <¢ is nothing else than the conditional failure probabil- 
ity of a system, which has survived the interval [0,¢,] (in either case the system has 
failure rate A(¢,) at time ¢; ). Hence, by formula (2.98): 


P(T, < t2|T, =t1) = 1-e F@t)-A)] 


But this is formula (7.48) and just as there it can be concluded that the joint density 
of the random vector (7, 72) is given by (7.49) with n = 2. Finally, induction yields 
that the joint density of the random vector (T1,7,..., Tn) is for all n = 1,2,... given 
by (7.49), where A(f) is the failure rate of the system. | 


The minimal repair policy provides the theoretical fundament for analyzing a number 
of more sophisticated maintenance policies including preventive replacements. To 
justify preventive replacements, the assumption has to be made that the underlying 
system is aging, i.e. its failure rate is increasing (pages 87-89). 


The criterion for evaluating the efficiency of maintenance policies will be the average 
maintenance cost per unit time over an infinite time span. To establish this criterion, 
the time axis is partitioned into replacement cycles, i.e. into the times between two 
neighboring replacements. Let Z; be the random length of the ith replacement cycle 
and C; the total random maintenance cost (replacement + repair cost) in the th re- 
placement cycle. It is assumed that the L; are independent and identically distributed 
as L. This assumption implies that a replaced system is as good as the previous one 
(‘as good as new’) from the point of view of its lifetime. The C; are assumed to be 
independent, identically distributed as C, and independent of the L;. Then the main- 
tenance cost per unit time over an infinite time span is 


The strong law of the large numbers implies 

E(C) 

K=——. 7.66 

EQ) (7.66) 
For the sake of brevity, K is referred to as the (long-run) maintenance cost rate. Thus, 
the maintenance cost rate is equal to the mean maintenance cost per cycle divided by 
the mean cycle length. In what follows, cp denotes the cost of a preventive replace- 
ment, and cm is the cost of a minimal repair; cp, Cm constants. 


Policy 1 A system is preventively replaced at fixed times 1, 2t,.... Failures between 
replacements are removed by minimal repairs. 
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This policy reflects the common approach of preventively overhauling complicated 
systems after fixed time periods whilst in between only the absolutely necessary re- 
pairs are done. With this policy, all cycle lengths are equal to t so that in view of 
(7.65) the mean cost per cycle is equal to cp +cmA(t). Hence, the corresponding 
maintenance cost rate is 

K\()= Cp wie 


A replacement interval t= t*, which minimizes K|(t), satisfies the condition 
tA(t) — A(t) = Cp/em. 
If A(@ tends to infinity as t— oo, then there exists a unique solution t= t* of this 
equation. The corresponding minimal maintenance cost rate is 
K (t*)=cmX(t*). 


Policy 2 A system is replaced at the first failure which occurs after a fixed time t. 
Failures which occur between replacements are removed by minimal repairs. 


This policy makes use fully of the system lifetime so that, from this point of view, it 
is preferable to policy 1. The partial uncertainty, however, about the times of replace- 
ments leads to larger replacement costs than with policy 1. The replacement is no lon- 
ger purely preventative so that its cost are denoted as c;. Thus, in practice the mainte- 
nance cost rate of policy 2 may actually exceed the one of policy | if c; is sufficiently 
larger than the cp used in policy 1. The residual lifetime 7; of the system after time 
point t, when having survived interval [0,1], has according to (2.93) mean value 


_1\z 
u(t) = a J F(x) dx. (7.67) 


The mean maintenance cost per cycle is cy +cmA(t), and the mean replacement cycle 
length is t+ p(t) so that the corresponding maintenance cost rate is 


Crt cmA(t) 
t+ p(t) 
An optimal t = t* satisfies the necessary condition dK 7(t)/dt =0, Le., 


| A@) + 4-1 u(r) =r. 


K2(t) = 


Example 7.8 Let the system lifetime 7 have a Rayleigh distribution with failure rate 
X(t) = 21/02. The corresponding mean residual lifetime of the system after having sur- 


vived [0, t] is 
u(t) = sre! -of 2 :) | 


If @ = 100[A~'], cm =1, and c,=5, the optimal parameters are 
t*= 180[h], Ko(t*) = 0.0402. O 
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Policy 3 The first n — 1 failures are removed by minimal repairs. At the time point 7’; 
of the nth failure, a replacement is carried out. 
The random cycle length is = T,,. Hence, the maintenance cost rate is 
Crt (n a 1) Cm 
E(Tn) ; 


where the mean cycle length E(7;,) is given by (7.46). By analyzing the behavior of 
the difference K3(n) — K3(n—1), an optimal n = n* is seen to be the smallest integer 


K3(n) = 


n satisfying 
E(Tn) -—[n-14+cr/em] EV p41) 20; n=1,2,..., (7.68) 


where the mean time E(Y,) between the (n-1) th and the 7 th minimal repair is given 
by formula (7.47). 


Example 7.9 Let the system lifetime 7 have a Weibull distribution: 
B-1 B 
m=2(4)", a@ea(E)", Bot. (7.69) 


Under this assumption condition (7.68) becomes 
Bn—-[n-1+c;-/cm] 20. 


Hence, if c; > cm, 


where |lx|| is the largest integer being less or equal to x. (If x <0, then |[x|| = 0.) If the 
aging process of the system proceeds fast (B large), then * is small. oO 


7.2.7 Applications To Risk Analysis 


Random point processes are key tools for quantifying the financial risk in virtually all 
branches of industry. This section uses the terminology for analyzing the financial 
risk in the insurance industry. A risky situation for an insurance company arises if it 
has to pay out a total claim amount, which exceeds its total premium income plus 
initial capital. To be able to establish the corresponding mathematical risk model, next 
the concept of a risk process has to be introduced: An insurance company starts its 
business at time ¢=0. Claims arrive at random time points T,,7>,... and come with 
the respective random claim sizes M,,M),.... Thus, the insurance company is sub- 
jected to a random marked point process 


{(71,M1),(T2,M2),...}5 
called risk process. The two components of the risk process are the claim arrival pro- 


cess {T,,T>,...} and the claim size process {M,,Mo,...}. Let {N(A, t= 0} be the 
random counting process, which belongs to the claim arrival process. Then the total 
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claim size C(t), the company is faced with in the interval [0, ¢], is acompound random 
variable of structure 


Ci = | yO ie NOEL (770) 
0 if Ma) =0. 


The compound Poisson process 
{C(), t2 0} 
is the main ingredient of the risk model to be analyzed in this section. 


To equalize the loss caused by claims and to eventually make a profit, an insurance 
company imposes a premium on its clients. Let K(4) be the total premium income of 
the insurance company in [0, ¢]. In case C(t) < k(t), the company has made a profit of 


K(1) — C(d) 
in the interval [0,¢] (not taking into account staff and other running costs of the 
company). 


With an initial capital or an initial reserve x, which the company has at its disposal at 
the start, the risk reserve at time ¢ is defined as 

RO =x+K)-CH (7.71) 
The corresponding (stochastic) risk reserve process is {R(t), t= 0}. If the sample 
path of {R(‘), t= 0} becomes negative at a time point t,, the financial expenses of 
the company in [0,t;] exceed its available capital of x+(t,) at the time point f,. 
This leads to the definition of the ruin probability p(x) of the company: 


p(x) = P(there is a positive, finite ¢ so that R(t) < 0). (7.72) 
Correspondingly, the non-ruin probability or survival probability of the company is 
q(x) = 1—p(x). 


These probabilities refer to an infinite time horizon. The ruin probability of the com- 
pany with regard to a finite time horizon t is 


p(x, t) = P(there is a finite ¢ with 0 <¢< Tso that R(A) < 0). 
The ruin probabilities p(x) and p(x, t) decrease with increasing initial capital x. 


Since ruin can only occur at the arrival time points of claims (Figure 7.5), p(x) and 
p(x,T) can also be defined in the following way: 


p(x) = P(there is a positive, finite integer 7 so that R(T) < 0). (7.73) 
p(x, t) = P(there is a positive, finite integer n with T, < t so that R(Tn) < 0), 
where R(T;,) is understood to be R(T; +0), i.e. the value of the risk reserve process 
at time point 7, includes the effect of the 7 th claim. 


Note In the actuarial literature, claim sizes are frequently denoted as U;, the initial capital as 
u, and the ruin probability as y(u). 
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We 


Figure 7.5 Sample path of a risk process leading to ruin 


In this section, the problem of determining the ruin probability is dealt with under the 
so-called 'classical assumptions:' 


1) {N(O, t= 0} is a homogeneous Poisson process with parameter i. 


2) The claim sizes M,,Mp,... are independent, identically as M distributed random 
variables. They are independent of the 7), 7 ,.... 


3) The premium income is a linear function in & «() = «¢. The constant parameter « 
is called the premium rate. 


4) The time horizon is infinite (t = ©). 


Under asumptions | and 2, risk analysis is subjected to a homogeneous portfolio, i.e. 
claim sizes are independent, differences in the claim sizes are purely random, and the 
arrival rate of claims is constant. For instance, consider a portfolio which only includ- 
es policies covering burgleries in houses. If the houses are in a demarcated area, have 
about the same security standards and comparable valuables inside, then this portfolio 
may be considered a homogeneous one. Generally, an insurance company tries to es- 
tablish its portfolios in such a way that they are approximately homogeneous. Regard- 
less of the terminology adopted, the subsequent risk analysis will not apply to an 
insurance company as a whole, but to its basic operating blocks, the homogeneous 
portfolios. 


By assumption | and theorem 7.2, the interarrival times between neighboring claims 
are independent and identical as Y distributed random variables, where Y has an ex- 
ponential distribution with parameter 4 = 1/u. The mean claim size is denoted as v : 


u=E(Y) and v= E(M). (7.74) 


By (7.64), under the assumptions | and 2, the trend function of the total claim size 
process {C(f), t= 0} is a linear function in time: 


E(C(t) = ne t>0. (7.75) 


This justifies assumption 3, namely a linear premium income in time. 
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In the longrun, an insurance company, however large its initial capital may be, can- 
not be successful if the average total claim cost in any interval [0, f] exceeds the 
premium income in [0, ¢]. Hence, in what follows the assumption 
Ku-v>0 (7.76) 

is made. This inequality requires that the average premium income between the arrival 
of two neighboring claims is larger than the mean claim size. The difference ku-—v 
is called safety loading and will be denoted as o: 

O=KU-Vv. 
Let distribution function and density of the claim size be 

Bo) =P(M<y) and b(y) = dBi) /dy. 


Derivation of an Integro-Differential Equation for g(x) To derive an integro-dif- 
ferential equation for the survival probability, consider what may happen in the time 
interval [0, Ag] : 
1) No claim arrives in [0, A¢]. Under this condition, the survival probability is 

q(x +1 Ad). 
This is because at the end of the interval [0, At] the capital of the company has in- 
creased by «Af units. So the 'new' initial capital at time point At is x + K At. 
2) One claim arrives in [0, A¢] and the risk reserve remains positive. Under this condi- 


tion, the survival probability is 
x+« At 


0 gxtKAt—y) by) dy. 
To understand this integral, remember that 'b() dy' can be interpreted as the 'probab- 
ility' that the claim size is equal to y (see comment after formula (2.50) at page 61). 


3) One claim arrives in[0, Af] and the risk reserve becomes negative (ruin occurs). 
Under this condition, the survival probability is 0. 


4) At least two claims arrive in [0, Af]. Since the Poisson process is simple, the pro- 
bability of this event is o(Ad). 


To get the unconditional survival probability, the conditional survival probabilities 
1-4 have to be multiplied by the probabilities of their respective conditions and 
added. By theorem 7.1, the probability that there is one claim in [0, A¢], is 


P(N(O, At) = 1) = AAt+ 0(Ad), 
and, correspondingly, the probability that there is no claim in [0, Af] is 
P(N(O, At) = 0) = 1 —AAt+ (Ad). 
Therefore, given the initial capital x, 
q(x) = [1 -AAt+ o(AD] g(x + K Ad) 


x+K At 


+[A Art o(AD] JF 


q(x + « At—y) b(y) dy + o( Ad). 
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From this, letting A = kAt, by some simple algebra, 


qx +h) ~ qe) _ A oO 
h 


q(x +h—y) by) dy + 
Assuming that g(x) is differentiable, letting i > 0 yields 
q(x) = *[ gx) -J% g@—») bO) dy. (7.77) 


A solution can be obtained in terms of Laplace transforms, since the integral in (7.77) 
is the convolution of g(x) and b(y): Let q(s) and b(s) be the Laplace transforms of 
q(x) and b(y), respectively. Then, applying the Laplace transformation to (7.77), 
using its properties (2.123) and (2.127) (page 100) and replacing A with 1/p yields a 
simple algebraic equation for g(s) 


Gs) — q(0) = | G(s) - G(s) bts) |. 


g(x +h)— A prt 


Solving for g(s) gives 
1 


s- all - b(s)] 


q(s) = q(0). (7.78) 
This representation of g(s) involves the survival probability of the company q(0) on 
condition that it has no initial capital. 


Example 7.10 Let the claim size M have an exponential distribution with mean value 
E(M) = v. Then M has density 


diy) = ge”, y=0, 
so that 


1 


b(s) = is es EL eer - 


Inserting b(s) in (7.78) gives the Laplace transform of the survival probability: 


Ary vs +1 
q(s) = uks (vs +1)—vs q(0) UK. 


By introducing the coefficient 


K-Vv 
a= =a, O<a<l, (7.79) 
q(s) simplifies to 
in. 1 ae” 
W=| Son vs. aaa iw | \ 


Retransformation yields (Table 2.5, page 105) 


GO Ena pie Lees] q(0). (7.80) 
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If the company has infinite initial capital, then it can never experience ruin. Therefore, 
q(*) = 1 so that, from (7.80), survival and ruin probability without initial capital are 


q(0) =a and p(0)=1-a. (7.81) 
This gives the final formulas for the survival- and ruin probability: 


Qa Qa 
q(x)=1-(1-a)e V~, p(x)=(1-a)e V™. (7.82) 


Figure 7.6 shows the graph of the ruin probability in dependence on the initial capital 
x[$10*] for a =0.1 and « =0.2. In both cases, v = 0.4[$104]. From (7.79) one gets 
that for a = 0.1 the safety loading is o = 0.04, and for a =0.2 itiso=0.1. O 


> x [$104] 


Figure 7.6 Comparison of ruin probabilities for example 7.10 


Cramér-Lundberg Approximation If the explicit retransformation of q(s) as given 
by (7.78) is not possible for a given claim size distribution, then the Cramér-Lundberg 
approximation for the ruin probability p(x) is an option to get reliable information on 
the ruin probability if the initial capital x is large compared to the mean claim size: 


D(x) ® rye (7.83) 
where the Lundberg-coefficient r is defined as solution of the equation 
1_ fx 3 
mel e'Y By) dy =1, (7.84) 
and the parameter y is given by 
1_ se a 
Y= pK lo vel? BO) dy. 


Note that in view of (7.84) iz e’” B(y) can be interpreted as the probability density 
of a nonnegative random variable, and the parameter y is the mean value of this ran- 
dom variable (for a proof of (7.83) see, e.g., Grandell (1991)). 

A solution r of equation (7.84) exists if the probability density of the claim size b(y) 


has a 'short tail' to the right, which implies that large values of the claim size occur 
fairly seldom. 
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It is interesting to compare the exact value of the ruin probability under an exponen- 
tial claim size distribution (7.82) with the corresponding approximation (7.83): For 


BQ) =1-e CM, y>0, 
equation (7.84) becomes 


fg ede dy = = en 
—r 


so that r = a/v. The corresponding parameter y is 


— 1%, -an-ny 7,1 sf” _ oy p-(lv-n)y 
Y ara ye dy ah i yvd/v-ne dy 

_ 1 

i 
uk (1/v—7r) 

After some simple algebra: 

ol 
ry = 1-@. 


By comparing (7.82) and (7.83): 


The Cramér-Lundberg approximation gives the exact value of the ruin probability 
if the claim sizes are exponentially distributed. 


Lundberg Inequality Assuming the existence of the Lundberg exponent r as defined 
by equation (7.84), the ruin probability is bounded by e : 

D(x)<e™. (7.85) 
This is the famous Lundberg inequality. A proof will be given in chapter 10, page 
490, by applying martingale techniques. 


Both F. Lundberg and H. Cramér did their pioneering research in collective risk analysis in the 
first third of the twentieth century; see Lundberg (1964). 


Example 7.11 As in example 7.10, let v = 0.4 [$104], but M is assumed to have a 
Rayleigh distribution: 
Bly) = PM > y) =e )~, yO. 


Since v = E(M) =0/ 72/4 =0.4, the parameter 0 must be equal to 0.8/,/a . Again the 
case a = 0.1 is considered, i.e. uk = 4/9 = 0.4 and o = 2/45 = 0.04. The corresponding 


Lundberg exponent is solution of >i elY @~ R(/ 0.8)" dy = 1, which gives 


r=0.398 and y= 2 fF ye0398y e-7(7/0.8)" dy = 0.2697. 


Figure 7.7 shows the graphs of the Cramér-Lundberg approximation (7.83) and the 
upper bound (7.85) for the ruin probability p(x) in dependency of the initial capital x: 
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p(x) © 0.9316 - e0398* n(x) < e038, x > 0. 


Although (7.83) yields best results only for large x, the graph of the approximation is 
everywhere lower than the upper bound (7.85). The dotted line shows once more the 
exact ruin probability for exponentially distributed claim sizes with the same mean 
and a—values as in Figure 7.6. Obviously, the distribution type of the claim size has 
a significant influence on p(x) under otherwise the same assumptions. O 


K p(x) 


approximation 
upper bound 


1 
0.8 
0.6 


0.4 
0.2 


4 
ad x [$10*] 


Figure 7.7 Approximation and upper bound for the ruin probability for Example 7.11 


7.3 RENEWAL PROCESSES 


7.3.1 Definitions and Examples 


The motivation for this chapter is a simple maintenance policy: A system is replaced 
on every failure by a statistically equivalent new one in negligible time and, after that, 
the new system (or the 'renewed system’) immediately starts operating. In this context, 
the replacements of failed systems are also called renewals. The sequence of the sys- 
tem lifetimes after renewals generates renewal process: 


Definition 7.5 An ordinary renewal process 1s a sequence of nonnegative, independ- 
ent, and identically distributed random variables {Y1, Yo,...}. e 


Thus, Y; is the time between the (i— 1) th and the ith renewal; Renewal processes do 
not only play an important role in engineering, but also in the natural, economical, 
and social sciences. They are a basic stochastic tool for modeling particle counting, 
population development, and arrivals of customers at a service station. In the latter 
context, Y;is the random time between the arrival of the (i— 1) th and the 7th custom- 
er. Renewal processes are particularly important in actuarial risk analysis, namely for 
modeling the arrival of claims at an insurance company, since they are a straightfor- 
ward generalization of homogeneous Poisson processes. In this section a terminology 
is adopted, which refers to the 'simple maintenance policy’. 
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If the observation of a renewal process starts at time t= 0 and the process had been 
operating for a while before that time point, then the lifetime of the system operating 
at time t= 0 is a 'residual lifetime' as introduced in section 2.3.4 (page 86) and will, 
therefore, usually not have the same probability distribution as the lifetime of a sys- 
tem after a renewal. Hence it makes sense to define a generalized renewal process 
by assuming that only the Y, Y3,... are identically distributed. This leads to 


Definition 7.6 Let {Y,, Y2,...} be a sequence of nonnegative, independent random 
variables with property that Y,; has distribution function 
FiQ=P% sd, 
whereas the random variables Y>, Y3,... are identically distributed as Y with distribu- 
tion function 
F()=P(V St). 
Then {Y}, Y,...} is called a delayed renewal process. e 


The random time point at which the th renewal takes place is 
Ppa AG = lo ae 


The random point process {7|,7,...} is called the process of the time points of re- 
newals. The time intervals between two neighboring renewals are renewal cycles. 


The corresponding counting process {M(A), t= 0}, defined by 


max(n; Tn <0) 
N = 
C) 0. for t<T,’ 


is called renewal counting process. Note that N(f) is the random number of renewals 
in (0, ¢], i.e., a possible renewal at time point f= 0 is not counted. The relationship 
Nt) 2n if and only if Ty <t (7.86) 
implies 
Fr, =P(Tn $0) = P(N} 2 0). (7.87) 
Because of the independence of the Y;, the distribution function F'7, (4) is the convo- 
lution of F'(¢) with the (n — 1)th convolution power of F (page 190): 
Fr (=F, eFC VO), FOW=1, t20; n=1,2,.. (7.88) 
If the densities 
A= FiO and f= FO 
exist, then the density of Ty is 
fr, O=fi rfeVO, FOM=1, 120; n=1,2,... (7.89) 
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Using (7.87) and 
P(N(t) =n) = P(N(D) = 1) + P(N(D) = n+ 1), 
the probability distribution of N(f) is seen to be 
PNO=n)=Fr,O-Fr,,,0, Fr(Q=1; n=0,1,.... (7.90) 


Example 7.12 Let {Y,, Y>,...} be an ordinary renewal process with property that the 
renewal cycle lengths Y; have an exponential distribution with parameter A : 


F(t)=P(Y¥<t)=1-e™, t>0. 


Then, by theorem 7.2, the corresponding counting process { N(f), f= 0} is the homo- 
geneous Poisson process with intensity 4. In particular, by (7.21), Tn has an Erlang 
distribution with parameters n and 1: 


2 (DF 
Fr (t)=P(In<t)=e™' D a Oo 
i=n - 
Apart from the homogeneous Poisson process, there are two other important ordinary 
renewal processes for which the convolution powers of the renewal cycle length dis- 
tributions explicitely exist so that the distribution functions of the renewal time points 
Tn can be given: 


1) Erlang Distribution The renewal cycle length Y have an Erlang distribution with 
parameters m and A. Then Ty, is the sum of mn independent, identically distributed 
exponential random variables with parameter 1. Therefore, 7, has an Erlang distribu- 
tion with parameters mn and i: 


eo) i 
F*)(#) =P(Tn<th=e™ ¥ a, t>0. (7.91) 
i=mn 1! 
This result is of general importance, since the probability distribution of any nonneg- 
ative random variable can be arbitrarily accurately approximated by an Erlang distri- 
bution by proper choice of the parameters of this distribution. 


2) Normal Distribution Let the renewal cycle length Y have a normal distribution 
with parameters pp and o, u>3o0. The assumption p> 30 is necesssary for making 
sure that the cycle lengths are practically nonnegative. (Renewal theory, however, has 
been extended to negative 'cycle lengths' as well.) Since the sum of independent, nor- 
mally distributed random variables is again normally distributed, where the parame- 
ters of the sum are obtained by summing up the parameters of the summands (formula 
(4.72), page 191), Tn has distribution function 


F*(t) =P(Tn <1) = ofS tt) t>0. (7.92) 
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This result has a more general potential for applications: Since 7, is the sum of 7 in- 
dependent, identically distributed random variables, then, by the central limit theorem 
(theorem 5.6), 7, has approximately the distribution function (7.92) if 7 is sufficient- 
ly large: 


Tn =~ N(nw,on) if n> 20. 


Example 7.13 The distribution function of 7, can be used to solve the spare part 
problem: How many spare parts (spare systems) are necessary for making sure that 
the renewal process can be maintained over the interval [0, ¢] with probability 1 - a? 


This requires to determine the smallest integer n satisfying 
1-Fr,() = P(N) <)> 1a. 


For instance, let 1 = E(Y) = 8 and o? = Var(Y) = 25. If t= 200 and 1-a=0.99, then 


1 F r,(200) = 1 -( 200-8) > 1-0 =0.99 


is equivalent to 


8 n-200 
20.0] =2.32< a 


Thus, at least 1i, =34 spare parts have to be in stock to ensure that with probabil- 
ity 0.99 every failed part can be replaced by a new one over the interval (0, 200]. In 
view Of Min 2 20, the application of the normal approximation to the distribution of 
Tn is justified. O 


7.3.2 Renewal Function 


7.3.2.1 Renewal Equations 


The mean number of renewals which occur in a given time interval is of great practi- 
cal and theoretical importance. 


Definition 7.7 The mean value of the random number M(f) of renewals occurring in 
(0, ¢] as a function of ¢ is called renewal function. e 


Thus, with the terminology and the notation introduced in section 6.2, the renewal 
function is the trend function of the renewal counting process {Mf), t= 0}: 


m(t) = E(N(2)), t= 0. 


To be, however, in line with the majority of publications on renewal theory, in what 
follows, the renewal functions belonging to an ordinary and a delayed renewal process 
are denoted as H(t) and H,(d), respectively. If not stated otherwise, it is assumed 
throughout section 7.3 that the densities of Y and Y, exist: 


dF(t) = fit)dt and dF \(t) =fy(ddt. (7.93) 
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In this case, the first derivatives of H,(4) and H(t) also exist: 
dH (t) dH(t) 
dt ° dt — 
The functions 4 ;(¢) and h(a) are the renewal densities of a delayed and of an ordi- 


nary renewal process, respectively. From (2.9) (page 46), a sum representation of the 
renewal function is 


hy(t)= A(t) = 


H(t) = E(N(t)) = Xj-1 P(N(t) =n). (7.94) 
In view of (7.87) and (7.94), 
H\)=d2 Fy e+ FOV. (7.95) 
In particular, the renewal function of an ordinary renewal process is 
AO) = Zt FO}. (7.96) 


By differentiation of (7.95) and (7.96) with respect to ¢, one obtains sum represen- 
tations of the respective renewal densities: 


hOD=Trahir¥leVO, hO=TafO. 


Remark These sum representations allow a useful probabilistic interpretation of the renewal 
density: For Ar sufficiently small, 


hy (t) At or A(t) At, 


respectively, are approximately the probabilities of the occurrence of a renewal in the interval 
[t,t + At]. (Compare to the remark after formula (2.50), page 61.) 


By (7.95) and the definition of the convolution power of distribution functions, 
Ay) = Dao Fp tO") 
=F\()+ D1 J) Fit POY e-x) d(x) 
=F y()+ J S24 (Fy FY e—29) dx). 
Again by (7.95), the integrand is equal to H;(t—x). Hence, H(t) satisfies 
Hy()=F\() +], Ai¢-x) dF). (7.97) 


By assumption (7.93), the integral in (7.97) is the convolution H, *f of the renewal 
function H, with f. In particular, the renewal function H(t) of an ordinary renewal 
process satisfies the integral equation 


H(t) = F(t) + [ H(t—x) dF). (7.98) 
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A heuristic derivation of formula (7.98) can be done by conditioning with regard to 
the time point of the first renewal: Given the first renewal occurs at time x, the mean 
number of renewals in [0, ¢] is 

[1+A(t-x)], O<x<t. 
Since the first renewal occurs at time x with ‘probability’ dF(x) = f(x) dx, taking into 


account all possible values of x in [0,f] yields (7.98). The same argument yields an 
integral equation for the renewal function of a delayed renewal process: 


H(t) = F(t) +) H(t-x) dF (x). (7.99) 


This is because after the first renewal at time x the process develops in (x, ¢] as an ordi- 
nary renewal process. Since the convolution is a commutative operation, the renewal 
equations can be rewritten. For instance, integral equation (7.97) is equivalent to 


Hy (0) =F \()+ [5 Flt») dH (x). (7.100) 
The equations (7.97)—(7.100) are called renewal equations. 


By differentiating the renewal equations (7.97) to (7.99) with respect to ¢, one obtains 
analogous integral equations for h;(f) and h(A): 


nQ=fil+ fi ai(t-xfe)dx, (7.101) 
h(t) = f(t) + Jo At-fla)dx, (7.102) 
hy =fild)+ fi Ae-fieax. (7.103) 


Generally, solutions of the renewal equations including equations (7.101) to (7.103) 
can only be obtained by numerical methods. Since, however, all these integral equa- 
tions involve convolutions, it is easily possible to find their solutions in the image 
space of the Laplace transformation. To see this, let h1(s), A(s), f\(s), and f(s) in 
this order be the Laplace transforms of ,(0, A(), fi(@, and f(A. Then, by (2.127), 
applying the Laplace transformation to (7.101) and (7.102) yields algebraic equations 
for h 1(s) and h(s): 


hy(s)=fi(s)+hy(s)-f(9), Als) = f(s) + Als) -f(). 
The solutions are 


fi) tS) 
hige ee! ee. 
1—f(s) 1—f(s) 
Thus, for ordinary renewal processes there is a one-to-one correspondence between 
the renewal function and the probability distribution of the cycle length. By (2.120), 
the Laplace transforms of the corresponding renewal functions are 


(7.104) 


ngs we: age (7.105) 
s(1-f(s)) s(l1-f(s)) 
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Integral Equations of Renewal Type The renewal equations (7.97) to (7.100) and 
other, equivalent ones derived from these belong to the broader class of integral equa- 
tions of renewal type. A function Z(f) is said to satisfy an integral equation of renewal 
type if for any function a(t), which is integrable on [0,00), and for any probability 
density f(x) of a nonnegative random variable, 


Z(t) = a(t) + [ Zt-x) fx) de. (7.106) 


A function Z(t) satisfying (7.106) need not be the trend function of a renewal count- 
ing process; see example 7.17. As proved in Feller (1971), the general solution of the 
integral equation (7.106) has the unique structure 


Z(t) = g(t) + [5 g(t—Nh@odx, 


where h(f) is the renewal density of the ordinary renewal process belonging to f(x). 
Example 7.14 Let /\() = f(t)=e™", t> 0. The Laplace transform of f(t) is 
f= 


S+ e 
By the right equation in (7.105), 


Hs) = AGI (s- AS) a 


The corresponding preimage (Table 2.5, page 105) is H(f4)=A+t. Thus, an ordinary 
renewal process has exponentially with parameter i distributed cycle lengths if and 
only if its renewal function is given by H(A) = At. O 


Example 7.15 Let the cycle length of an ordinary renewal process be a mixture of 
two exponential distributions: 


f=parye*"+(1-pyrre 


with O<p<1,4; >0,A2>0, t= 0. With its three free parameters, this distribution 
can be expected to provide a good fit to many lifetime data sets. The Laplace trans- 
form of f(A) is 


Pd, (l-p)A, 
S+A] Stir 


f(s) = 


Hence, the right formula of (7.104) yields the Laplace transform of the corresponding 
renewal density 


phy (l-p)Ay 


" r XV 

h(s) = S+A, St+tA2 
— par _(=pad 
S+h] sti 


From this, by identical transformations, 
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[pri + —p)Ag]s+Ajrg 


Rios 
Grea) es Pals 


[pAr+U —p)Ag]s+Air2 

s2+(1—p)Ajs+pags 
__PAi+(=pyr2_ , Aiko 
st+(l—p)Aj+paz  s[s+(1—p)A, +pAz] 


Retransformation is easily done by making use of Table 2.5 (page 105) 


7 Ay A2 
MO Gaya +pha 
AA 
- [pa +c P)A2 (i aaa e lU—p)a1t+pr2It  F=0: 


After some algebra, 
Aide (Ar = Aa)? 
+p(l 
(py epg PO py hap 
Integration yields the renewal function: 
AiA2 
(1—p)A, +pag 


eldpyitparit | ¢>0. 


h(t) = 


A(t) = 


(Ayan)? TA aN aeka lt 
POUND Geena isp! (1 i ). 


Mean value pt = E(Y) and variance o? = Var(Y) of the renewal cycle length Y are 


Aine airy! dope tes Ds a SL 
1 Ag djA2 : 


Do. 24 3 
7 TIN aad ca ca 2 Lt a) 


2 2 252 
My 3 Ay AZ 


With these parameters, the representation of the renewal function can be simplified: 
H(t) = fy [ = 7 (1 = elspa) t>0. 
Hy? : 


For), =A, =A and p= 1 this representation of H(¢) reduces to A(t) = At. O 


More explicit formulas for the renewal function of ordinary renewal processes exist 
for the following two classes of cycle length distributions: 
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1) Erlang Distribution Let the cycle lengths be Erlang distributed with parameters 
mand 2. Then, by (7.87) and (7.91), 


oO 0 i 
H®=e"Y & oe (7.107) 
n=li=mn ?+ 
In particular, 
m=1: HA(H=At (homogeneous Poisson process) 
_9. Wee al 
m=2: AM) =5| ME at 7e 
m=3: Ha=4 eats Beets sin[ Baek) 
3 3 2 3 
As eee ere ee ey “AM ( ) | 
m=4: H(t) 4|¥é 5 +5 +/2e sin Atta ; 


2) Normal Distribution Let the cycle lengths be normally distributed with mean val- 
ue wand variance o”, > 307%. From (7.87) and (7.92), 


_< t-nu 
m= ¥ of 4 } (7.108) 


This sum representation is very convenient for numerical computations, since already 
the sum of the first few terms approximates the renewal function with sufficient accu- 
racy. 

As shown in example 7.14, an ordinary renewal process has renewal function 


H(t)=At=t/p if and only if f(t) =Are™, t= 0, 


where 1 = E(Y). An interesting question is, whether for given F(t) a delayed renewal 
process exists which also has renewal function H,(f) = ¢/p. 


Theorem 7.11 Let {Y,, Yo,...} be a delayed renewal process with cycle lengths 
Y>, Y3,... being identically distributed as Y. If Y has finite mean value pw and distri- 
bution function F(4 = P(Y < 4), then {Y), Yo,...} has renewal function 


Ay(t)=t/p (7.109) 
if and only if the length of the first renewal cycle Y; has density /)(¢) = fs(f) , where 
fal) = =F), t>0. (7.110) 


Equivalently, {Y,, Y>,...} has renewal function (7.109) if and only if Y; has distribu- 
tion function F';(t) = F'5(4) with 


Fs)= 7S (1—F(x))dx, 120. (7.111) 
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Proof Let f(s) and f(s) be the respective Laplace transforms of f(‘) and f5(4). By 
applying the Laplace transformation to both sides of (7.110), 


Aa 1 a 
fs(s) = Gs A -S(s)). 
Replacing in the left equation of (7.105) Fi (s) with f(s) yields the Laplace trans- 
form of the corresponding renewal function 11 (4) = Hs(4) : 
Hs(s) = W(us?). 


Retransformation of H. s(s) gives the desired result: H(A) = t/p. a 


The first two moments of S are 
2 2 
Ho +90 2, _ U3 
= Te, eon .112 
E(S) On and E(S~) Bu (7 ) 


where o? = Var(Y) and 3 = E(Y>). 


The random variable S with density (7.110) plays an important role in characterizing 
stationary renewal processes (section 7.3.5). 


7.3.2.2 Bounds on the Renewal Function 


Generally, integral equations of renewal type have to be solved by numerical methods. 
Hence, bounds on H(t), which only require information on one or more numerical pa- 
rameters of the cycle length distribution, are of special interest. This section presents 
bounds on the renewal function of ordinary renewal processes. 


1) Elementary Bounds By definition of T;, , 
max Y; <4 Y;=Tn. 


lsi<n 
Hence, for any ¢ with F(t) < 1, 
F*™(t) =P(Tn <t)< P( max Y,<)=[FQ)". 
<i<n 


Summing from 7 = 1 to © on both sides of this inequality, the sum representation of 
the renewal function (7.96) and the geometric series (2.16) at page 48 yield 


F(t) 
FO <HOs 1 FO’ 


The left-hand side of this inequality is the first term of the sum (7.96). These bounds 
are only useful for small ¢. 


2) Marshall-Bounds Let F = {t; > 0, F(t)< 1}, p= E(Y), F(t) =1-F(d), and 


aj= ep OLE aj= FO -Fs(O 
teF F(t) tek —- F(t) 


5) 
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where F's(f) is given by (7.111). Then, 
t t 
ptt SAMOS Gta. (7.113) 


The derivation of these bounds is straightforward and very instructive: According to 
the definition of ag and a1, 


ag F(t) < F(t)—F s(t) < ay F(2). 
Convolution of both sides with F*)(t) leads to 
ag POD - PUD | < POO -Fse# POM <a] POO - Pen |. 


In view of (7.96) and theorem 7.11, summing up from 7 =0 too on both sides of 
this inequality proves (7.113). Since 


Fa) - Fs 
F(t) 


formula (7.113) implies a simpler lower bound on H(?): 


>—-F s(t) 2-1 for all t= 0, 


t t 
> ee SS 


Let s(t) =fs(t) /F s(t) be the failure rate belonging to F's(t): 
F(t 

hea 0. 

IP Fx) dx 


Then ag anda, can be rewritten as follows: 


ae er cee fall eee 
a> tO) pene H iep hs) 


Thus, (7.113) becomes 


oe Ae ep OS glia td 
utp ae is 1<HO< utp SUD x nr -1. (7.114) 
Since 
inf A(t) < ut As(t) and ee A(t) = sup AsO), 
teF 
the bounds (7.114) can be simplified: 
ti. 1 1 
“rE a inf 7-1 < He <a sup 7 - l. 7.115 
pow lag Oe exo a 


3) Lorden's Upper Bound If t= E(Y) and pz = E(Y*), then 


Ajs t+ -1, (7.116) 
ane 
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4) Brown's Upper Bound If F(¢) is JFR, then (7.116) can be improved: 


t L2 
H <-4+-——- 
() yu 2 2 


5) Barlow and Proschan Bounds If F(¢) is /FR, then 
a ee: Open (7.117) 
J Feo) de J Fx) de 


Example 7.16 Let 
F(t)=(1-e*)*, #20, 


be the distribution function of the cycle length Y of an ordinary renewal process. In 
this case, p= E(Y) = 3/2 and 
Fs) =} [? Fayar= 2(2 - Letje, 1>0. 


Therefore, the failure rates belonging to F(f) and F's(4) are (Figure 7.8) 


1 = FED, igi =22=8 


=e t=0. 


Both failure rates are strictly increasing in ¢ and have properties 
(0) = 0, A(co) = 1 and As5(0) = 2/3, A5(«) 
Hence, the respective bounds (7.114) and (7.115) are (Figure 7.9) 


21-2 < H(t) <4t and 21-4<H() <o. 
1.2 
H(t) —— 
1.0 1 
0.8 0.8 
0.6 As(0) 0.6 
0.4 XG) 0.4 
0.2 0.2 
! ! ! >t | | >t 
0 1 2 0 1 2 


Figure 7.8 Failure rates Figure 7.9 Bounds for the renewal function 
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In this case, the upper bound in (7.115) contains no information on the renewal func- 
tion. Figure 7.9 compares the bounds (7.114) with the exact graph of the renewal 
function given in example 7.15 The deviation of the lower bound from H(?) is negli- 
gibly small for ¢> 3. 


7.3.3 Asymptotic Behavior 


This section investigates the behavior of the renewal counting process {N(f), t= 0} 
and its trend function H(#) as t > 00. The results allow the construction of estimates 
of the renewal function and of the probability distribution of M(4) if t is sufficiently 
large. Throughout this section, it is assumed that both un; = E(Y;) and p= E(Y) are 
finite. Some of the key results require that the cycle length Y or, equivalently, its dis- 
tribution function, is nonarithmetic (see definition 5.3, page 216), i.e., that there is no 
positive constant b with property that the possible values of Y are multiples of b. A 
continuous random variable is always nonarithmetic. 


A simple consequence of the strong law of the large numbers is 


ae (3 a ) 7 
(im aaa Syl, (7.118) 
To avoid technicalities, the verification of (7.118) is done for an ordinary renewal 


process: The inequality Ty) <t < Ty()41 implies that 


Tn fn! Tna+t — Tn+1 N(O)+1 
M) NO ~ MO MOH NO 


or, equivalently, that 


1 vO t i NO+1 N()+1 
NG Zl Yi nig < [gat Ee ( NO 


Since by assumption np = E(Y)<, Mf) tends to infinity as t—> o. Hence, theorem 
5.4 yields the desired result (7.118). For 1. being the mean distance between two re- 
newals, this result is quite intuitive. 


The following theorem considers the corresponding limit behavior of the mean value 
of M(t). As with the subsequent theorems 7.13 and 7.14, no proof is given. 


Theorem 7.12 (elementary renewal theorem) The renewal function satisfies 


Corollary For large ¢, 
Ayo) tly. 
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The theorem shows that for t— o the influence of the first renewal interval with 
possibly 1; # p fades away. (For this property to be valid, the assumption [11 <0 
had to be made.) In terms of the renewal density, the analogue to theorem 7.12 is 
. 1 

limh,Q=-. 

pooh 1() H 
Note that (7.118) does not imply theorem 7.12. The following theorem was called 
fundamental or key renewal theorem by its discoverer W. L. Smith. 
Theorem 7.13 (fundamental renewal theorem) If F(t) is nonarithmetic and g(f) an 


integrable function on [0,), then 


lim [}, g¢—a) dH) = fr [f ato) de. " 


The fundamental renewal theorem (or key renewal theorem, theorem of Smith) has 
proved a useful tool for solving many problems in stochastic modeling. With 


Oe 1 for0<x<h, 
e 0 elsewhere, 


the fundamental renewal theorem implies 


Blackwell's renewal theorem: If F(¢) is nonarithmetic, then, for any h > 0, 
lim[H,(t+h)—H,(@] = £. (7.119) 
t>0 HL 

Whereas the elementary renewal theorem refers to 'a global transition’ into the station- 


ary regime, Blackwell's renewal theorem refers to the corresponding ‘local behavior' 
in a time interval of length h. 


Theorem 7.14 gives another variant of the fundamental renewal theorem. It refers to 
the integral equation of renewal type (7.106). 


Theorem 7.14 Let a(x) be an integrable function on [0, ©) and f(x) a probability den- 
sity. If a function Z(f) satisfies the renewal type equation 


Z(t) = a(t) +f, At—x) fx) de, (7.120) 
then 
lim Z( = i ,° ax) dr. 7" 


As mentioned previously, the function Z(¢) in (7.130) need not be a renewal function. 
Proofs of the now 'classic' theorems 7.12 to 7.14 can be found in Tijms (1994). 


In the following example, theorem 7.14 is used to sketch the proof the Cramer-Lund- 
berg approximation for the ruin probability (7.83); for details see Grandell (1991). 
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Example 7.17 The integro-differential equation (7.77) 
xr 
qx) =*{ aa) -J5 aee-») 60) ay | 


for the survival probability g(x) of an insurance company can be transformed by in- 
tegration on both sides and some routine manipulations to an integral equation for the 
ruin probability p(x) = 1 — gq) 


P(x) = ao(x) + Jj PO») B00) dy (7.121) 
with 
ag(x)=1-a- ie [} BO) dy and go) = KB), 
where « is given by (7.79). Equation (7.121) is not of type (7.120), since go(y) is 
only an 'incomplete' probability density: 
i]o BO = R= 1-a<1. 


For this reason, equation (7.121) is multiplied by the factor e”* = e’°) .e””, which 
transforms equation (7.121) into an integral equation for p;(x) = e/*p(x) : 


pr(x) = a(x) + [5 prae-yg) dy, (7.122) 


where a(x) = e”ao(x), g(v) =e” go(y), and r is such that g(y) is a probability densi- 
ty, Le., 


Jo sO)dv = ge Jp e BO) dy = 1. 


This is the definition of the Lundberg-exponent r according to (7.84). Now (7.132) is 
a renewal type equation and theorem 7.14 can be applied: With 


v=[o ve0)dv= aI} ve BO) dy and JF a(x)dx =F, 


theorem 7.14 yields 


Jim pr(x) = Jim e"p) = 7 


so that for large x 
p(x) = ae em O 
Theorem 7.15 If F(#) is nonarithmetic and o” = Var(Y) <0, then 


sis co = I 
tim (740 ar +S. (7.123) 


Proof The renewal equation (7.99) is equivalent to 
H()=F\(O+]5 F\(t-x)dH(x). (7.124) 


If F\() =F 5, then, by theorem 7.11 this integral equation becomes 
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i =F 5(t)+ |) Fs(t—x) dH(x). (7.125) 


By subtracting integral equation (7.125) from integral equation (7.124), 
Hy(t)- 7 =F 5) -F\() +f Fs(t—x) dH(x) -) F\¢-) dH). 
Applying the fundamental renewal theorem yields 


tim (41) - £) = BG Foeordtay - 4 IF Freee. 


Now the desired results follows from (2.52) and (7.112). a 
For ordinary renewal processes, (7.123) simplifies to 
aes A oes 

lim (H10- £) 5 [s - i) (7.126) 


Corollary Under the assumptions of theorem 7.15, the fundamental renewal theorem 
implies the elementary renewal theorem. 


Theorem 7.16 For an ordinary renewal process, the integrated renewal function has 


property 
2 

2 H2  h3 

lms H@jde=| 4 | F221 |¢ te 2 a Be 

rae i E te 4u> 6p? 
with > = E(Y2) and 3 = E(Y>). | 
For a proof see, for instance, Tijms (1994). The following theorem is basically a 
consequence of the central limit theorem; for details see Karlin, Taylor (1981). 


Theorem 7.17 The random number Mf) of renewals in [0, ¢] satisfies 


N(t)-t/p 


lim P 5 <x] =@(2). | 


m0 |g fey 


Corollary For ¢ sufficiently large, M(f) is approximately normally distributed with 
mean value ¢/. and variance o7t/u1?: 


M(t) = Mt/p, o2t/ p>). (7.127) 


Hence, theorem 7.17 can be used to construct approximate intervals, which contain 
N(t) with a given probability: If t is sufficiently large, then 


Pit-zunoftu? <NO<tt+zy90,Jtu> | =1-a. (7.128) 
nm pt 7a 


As usual, Z/2_ 1s the (1 — a/2)—percentile of the standard normal distribution. 
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Example 7.18 Let t= 1000, p=10, o=2, anda =0.05. Since zg 925 = 2, 
P(96 < Nt) < 104) = 0.95. Oo 


Knowledge of the asymptotic distribution of N(¢) makes it possible, without knowing 
the exact distribution of Y, to approximately answer a question which already arose 
in section 7.3.1: How many spare systems (spare parts) are necessary for guarantee- 
ing that the (ordinary) renewal process can be maintained over an interval [0, ] with 
a given probability of | —- a? Since with probability 1— a approximately 


NO)-t/w 
alee ne 


for large ¢ the required number 71,,;, 1S approximately equal to 


Min ® yp tZaS yt . (7.129) 


The same numerical parameters as in example 7.13 are considered: 
t= 200, p=8, o” =25, and a =0.01. 


Amin = 7? +2.32- 5200-8 = 32.25. 


Thus, 33 spare parts are at least needed to make sure that with probability 0.99 the 
renewal process can be maintained over a period of 200 time units. Remember, for- 
mula (7.92) applied in example 7.13 yielded 1 yin = 34. O 


Since 29.9} = 2.32, 


7.3.4 Recurrence Times 


For any point processes, recurrence times have been defined by (7.3) and (7.5). In 
particular, if {Y,,Yo,...} is a renewal process and {7),7>,...} is the corresponding 
process of renewal time points, then its (random) forward recurrence time A(t) is 


AQ) = T+ -¢ 
and its (random) backward recurrence time B(t) is 
BY) =t-Tyy- 

With the interpretation of renewal processes adopted in this chapter, A(A) is the resi- 


dual lifetime and B(t) the age of the system operating at time ¢ in the sense of termi- 
nology introduced in section 2.3.4 (Figure 7.10). The stochastic processes 


{¥1,¥2,...}.{71,72,..} {NO,t20}, {AM, t2 0}, and {B(A), t2 0} 
are statistically equivalent, since there is a one to one correspondence between their 


sample paths, i.e., each of these five processes can be used to define a renewal process 
(Figure 7.11). 
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<— BO) >< AD > 
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0 T\ Ty + Tn t Trt 


Figure 7.10 Illustration of the recurrence times 


>t 


Figure 7.11 Sample paths of the backward and forward recurrence time processes 
Let 
Fy(x) = P(A Sx) and Faw (x) = P(BO <x) 


be the distribution functions of the forward and the backward recurrence times. Then, 
for 0 <x <t, making use of (7.95), 


Pay(x) = PTs — Sx) 
= yy PUT pvt <t+x, N)=n) 
=F \(t+x)-F\() + Zp. 1 P(Tn <t< Ty, <t+x) 


=F \(t+x)-F()+ Det [pF tt-y)-F(t-y)] dF, (y) 

=F \(t+x)-F (0) +JjlF@e+t-y)- Ft-y)JEe dF 7,(V) 
=F\(t+x)-Fi()+JhlFett—y)-Ft-yJD pe dF) POY) 
=F \(t+x)- F(t) +Jj[F+t-y) -FU-yy\d(E2y Fi « FAO-DGy) 


= F\(t+x)-Fy(0) + jlFet+t—y)- F(t-y)] dA). 


This representation of F'4() can be simplified by combining it with (7.100). The re- 
sult is 


F(x) =F i(t+x)-]) Fat t—y)dH\Q); x,t20. (7.130) 
Differentiation yields the probability density of A(A): 
fag) =filt+x)+[jfett-yhiQ)dy, x,t20. (7.131) 
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The probability that the system, which is working at time ¢, does not fail in (¢, ¢+x] is 
F y(x) = 1-F4a@(). 


F ‘A(t)(x) is sometimes called interval reliability. 


For determining the mean value of the forward recurrence time of an ordinary renew- 
al process, A(t) is written in the form 
N(t)+1 
A=Xe1 Yi-t, 


where the Y|, Y2,... are independent and identically distributed as Y with np = E(Y). 
Wald's identity (4.74) at page 194 cannot be applied to obtain E(A(A), since M(t) + 1 
is surely not independent of the sequence Y1, Y2,.... However, M(t) +1 is a stopping 
time for the sequence Y, Y9,...: 


'MA+l=n'='MA=n-l'='Y, + Yor: + ¥y-1 St< Vy t+ Vote + Yn. 


Thus, the event 'M(t)+ 1 =n' is independent of all Y,41, Y,42,... so that, by defini- 
tion 4.2, M(t)+1 is a stopping time for the sequence Y;, Y2,... Hence, the mean val- 
ue of A(t) can be obtained from (4.76) at page 195 with N=MA+1: 


E(A()) = HIE) + DI] -t. 
Thus, the mean forward recurrence time of an ordinary renewal process is 
E(A()) = HLA) + 1] -¢. 
The probability distribution of the backward recurrence time is obtained as follows: 
Fa (x) =Pt-x Ss Try) 
= YP P(t-x < Tn, N(t) =n) 
= Pr P(t-x < Tn <t<Ty41) 
= Diet J), Fu) dF, (u) 


= fi Fu-wa(S2y FF) 
= Ji, Ft u)dit(u). 


Hence, the distribution function of B(t) is 


an 
| Fit-w)dH\(u) for O<x<t 
t-x 


Fa (x) -_ (7.132) 
for t>x 
Differentiation yields the probability density of B(t): 
F(x)hy(t—-x) for O<x<t, 
= 7.133 
Fa) 0 for t<x. ( ) 
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One easily verifies that the forward and backward recurrence times of an ordinary re- 
newal process, whose cycle lengths are exponentially distributed with parameter 1, 
are also exponentially distributed with parameter 4 : 


fam) = fa) =he™** for all > 0. 
In view of the memoryless property of the exponential distribution (example 2.21, 
page 87), this result is not surprising. 
A direct consequence of the fundamental renewal theorem is that F's(¢), as defined 
by (7.111), is the limiting distribution function of both backward and forward recur- 
rence time as f tends to infinity: 


lim F(x) = lim Fap(x)=F s(x), x20. (7.144) 
t>o0 t>00 


Paradox of Renewal Theory In view of the definition of the forward recurrence 
time, one may suppose that the following equation is true: 


lim E(A() = 1/2. 
t>00 


However, according to (7.134) and (7.112), 
2 2 
Hees 


lim EAD) =Jq Fs@dt= ES) =" > 5 


This 'contradiction' is known as the paradox of renewal theory. The intuitive explana- 
tion of this phenomenon is that on average the 'reference time point' ¢ is to be found 
more frequently in longer renewal cycles than in shorter ones. 


7.3.5 Stationary Renewal Processes 


By definition 7.1, a renewal process {Y , Y2,...} is stationary if for all A= 1,2,... and 
anysequence of integers 71,/2,...,i; with 1 <i, <i) <---<i, and any t=0,1.,... the 
joint distribution functions of the vectors 

(Fi ty rE 22) ¥;,) and (Vi, 40 Vieis-® Yi,+0) 


ig? 
coincide, k= 1,2,..... According to the corollary after definition 7.1, {Y,, Y2,...} is 
stationary if and only if the corresponding renewal counting process { Mf), t > 0} has 
homogeneous increments. A third way of defining the stationarity of a renewal pro- 
cess {Y,, Y2,...}. makes use of the statistical equivalence between {Y 1, Yo,...} and 
the corresponding processes {A(f), f= 0} or {B(t), t= 0}, respectively. 


A renewal process is stationary if and only if the process of its forward (backward) 
recurrence times {A(t), t= 0} ({B(d, t= 0}) is strongly stationary. 


The stochastic process in continuous time {B(d), f= 0} is a Markov process. This is 
quite intuitive, but a strict proof will not be given here. By theorem 7.1, a Markov 
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process {X(f),¢ € T} is strongly stationary if and only if its one-dimensional distribu- 
tion functions F(x) = P(X() <x) do not depend on ¢. Hence, a renewal process is 
stationary if and only if there is a distribution function F(x) so that 


F n(x) = P(A(O) <x) = F(x) for all x > 0 and ¢> 0. 


Theorem 7.18 yields a simple criterion for the stationarity of renewal processes: 


Theorem 7.18 Let F(x) = P(Y < x) be nonarithmetic and pp = E(Y) < «. Then a delay- 
ed renewal process given by F’;(x) and F(x) is stationary if and only if 

Aya =t/w. (7.135) 
Equivalently, as a consequence of theorem 7.11, a delayed renewal process is station- 
ary if and only if 


F\(*) =Fs(x) = i Jj Fo)dy for all x>0. (7.136) 


Proof If (7.136) holds, then (7.135) as well, so that, from (7.130), 


Hx F 


Fay) = a Jo Fo)dy— 7 Jy Fat t—ydy 


= EIS Foray 5 Foyay 
=t Ip Foday. 


Hence, F'4(y(x) does not depend on t. 
Conversely, if F'4()(x) does not depend on ¢, then (7.134) implies 
F 4(y() = F s(x) for all t. 


This completes the proof of the theorem. a 


As a consequence from theorem 7.87 and the elementary renewal theorem: After a 
sufficiently large time span (transient response time) every renewal process with non- 
arithmetic distribution function F(f) and finite mean cycle length np = E(Y) behaves as 
a stationary renewal process. 


7.3.6 Alternating Renewal Processes 


So far it has been assumed that renewals take only negligibly small amounts of time. 
In order to be able to model practical situations, in which this assumption is not ful- 
filled, the concept of a renewal process has to be generalized in the following way: 
The renewal time of the system after its ith failure is assumed to be a positive random 
variable Z;; i= 1,2,.... Immediately after a renewal the system starts operating. In this 
way, a marked point process {(Y;,Z;); i= 1,2,...} is generated, where Y; as before 
denotes the lifetime of the system after the ith renewal. 
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X(0) 
1 


L L io L 1 > t 
0 S; TT So T S3T3 Sq 14 
Figure 7.12 Sample path of an alternating renewal process 


Definition 7.8 (alternating renewal process) If {Y,, Y2,...} and {Z1,Z2,...} are two 
independent sequences of independent, nonnegative random variables, then the mark- 
ed point process {(Y},Z1),(V2,Z2),...} 1s said to be an alternating renewal process 
if the Y; and the Z; have the meanings given above. e 


The random variables 
Sp=¥i; Sn = DE FZ) + Vos 2 = 2,3, 004 
are the time points, at which failures occur and the random variables 
Ty = Dei (Yj +Z)); 2 =1,2,.. 
are the time points at which a new system starts operating. If an operating system is 
assigned a'l' and a failed system a '0', a binary indicator variable of the system state is 


0 if te [Sn, Tn), n= | eae 


X( — 
OC) 1 elsewhere. 


(7.137) 


Obviously, an alternating renewal process can equivalently be defined by the stochas- 
tic process in continuous time {X(A), t= 0} with X(A given by (7.137) (Figure 7.12). 


In what follows, all Y; and Z; are assumed to be distributed as Y and Z with distribu- 

tion functions Fy(y) = P(Y< y) and F(z) = P(Z <z), respectively. By agreement, 
P(X(40) = 1) = 1. 

Analogously to the concept of a delayed renewal process, the alternating renewal pro- 

cess can be generalized by assigning to the random lifetime Y, a probability distribu- 


tion different from that of Y. This way of generalization and some other possibilities 
will not be discussed here, although no principal difficulties would arise. 


Let N;(¢) and N;(¢) be the respective numbers of failures and renewals in (0, ¢]. Since 
Sn and Ty are sums of independent random variables, 


Fs, (t) = P(Sn St) = PIN/(t) 2 n) = Fy * (Fy * Fz)" (0), (7.138) 
Fr) =P(Tn St) = PINAY) =n) = (Fy * Fz). (7.139) 


Analogously to (7.95) and (7.96), sum representations of the mean values 
Ay(t) = E(N/() and H-() = E(N-()) 
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are 
Ay(t) = Dei Fy * (Fy * Fz)", 
and 


Ht) = Dpea(Fy * Fz). 


A(t) and H,(¢) are referred to as the renewal functions of the alternating renewal pro- 
cess. Since H(t) can be interpreted as the renewal function of a delayed renewal 
process, whose first system lifetime is distributed as Y, whereas the following 'system 
lifetimes' are identically distributed as Y+ Z it satisfies renewal equation (7.97) with 


Fi®Q=Fy and F() =F y * Fz(0. 
Analogously, H(t) can be interpreted as the renewal function of an ordinary renewal 
process whose cycle lengths are identically distributed as Y+ Z. Therefore, H-(#) sat- 
isfies renewal equation (7.98) with F(d) replaced by Fy * F'7(0). 
Let Ry be the residual lifetime of the system if it is operating at time ¢. Then 
P(X(t) = 1, Rt >x) 


is the probability that the system is working at time ¢ and does not fail in the interval 
(t,¢+x]. This probability is called interval availability or interval reliability, and it is 
denoted as A,(f). It can be obtained as follows: 


Ax() = P(X = 1, Ri > x) 
=p a ge Utes Ae We 2 Aes ee 9) 
=Fy(tt+x)+]) P(ttx—u< Yd Dai(Py * Fz)". 

Hence, 

Ax(t) = Fy(t+x) + [5 Fy(t+x-u) dH (u). (7.140) 
Note In this section '4' does no longer refer to forward recurrence time. 
Let A() be the probability that the system is operating (available) at time ¢: 

A(t) = P(X(d) = 1). (7.141) 


This important characteristic of an alternating renewal process is obtained from 
(7.140) by letting there x = 0: 


A(t) =F y(t) +] Fy(t-u) dH(u). (7.142) 
A(t) is called availability of the system, system availability, or, more exactly, point 


availability of the system, since it refers to a specific time point ¢. It is equal to the 
mean value of the indicator variable of the system state: 


E(X(d) = 1- P(X(d) = 1) + 0- P(X(d) = 0) = P(X(1) = 1) = A(0). 
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The average availability of the system in the interval [0, ¢] is 
A(t) = 4 Jj AQ) ar. 
The random total operating time U(t) of the system in the interval [0, ¢] is 
Ud) = Jy Xa) ade. (7.143) 
By changing the order of integration 
BUD) = E( I, X@) de) = fi, EXO) dr. 


Thus, 
E(UD) = J) A@) dx =1 A). 
The following theorem provides information on the limiting behavior of the interval 


reliability and the point availability as t— oo. A proof of the assertions need not be 
given, since they are immediate consequences of theorem 7.13. 


Theorem 7.19 If E(Y)+£(Z) < and the distribution function (Fy * F'7)(1) of the 
sum is nonarithmetic, then 


Ax = lim Ax(t) = oe ie Fy(u)du, 
4 = lim AQ) = lim A(t) = TOeSTEAe (7.144) 


a 
Ax is said to be the /ong-run or stationary interval availability (reliability) with re- 
gard to an interval of length x, and A is called the Jong-run or stationary availability. 
Clearly, A= Ag. If, analogously to renewal processes, the time between two neigh- 
boring time points at which a new system starts operating is called a renewal cycle, 
then the long-run availability is equal to the mean share of the operating time of a sys- 
tem in the mean renewal cycle length. Equation (7.144) is also valid if within renewal 
cycles Y; and Z; depend on each other. 


Example 7.19 Life- and renewal times have exponential distributions with densities 
fy) =e, y>0, and f7(z)=peH?, z>0. 
The Laplace transforms of these densities and of 
Fy) =e, y 20, 


are 


1 
SHH’ 


Tr()= ty. Fal)= se and L{F y,s} = 
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Application of the Laplace transform to the integral are (7.142) yields 
A(s) = L{Fy,5} + L{Fy,s} -hr(s) => 1h + hy(s) | (7.145) 
By (2.127), the Laplace transform of the convolution (fy * f7)(A) is 
fs 7 man 
Lt fy *fz,8} = fyls) -fz(s) = CES NICHES i 
From the second equation of (7.104) 


A x 
hy(s) = r 


S(s+X+ hp)" 
By inserting h r(s) into (7.145) and expanding A(s) into partial fractions, 


a a 
A(s) = a Sista) s(sthtp)’ 


Retransformation (use Table 2.5, page 105) yields the point availability 


_ 4M 4A owt p> 
40= To +THe 0, 


Since 
E(Y) = 1/4 and E(Z) = 1/n, 


taking in A(f) the limit as t— © verifies relationship (7.144). On the other hand, if 
X.# uw, as derived in example 4.14 (page 174), 


ae ug ee Se d in), 


FoF) hee ee 
For instance, if E(Z) = 0.25 E(Y), then 
EX) 
= a an = 0.800 
E(Y) + E(Z) 
and 


HS) = 0.717. 


Y _ FY) 

Ay) E(Y)+ EQ" a 
Usually, numerical methods have to be applied to determine interval and point avail- 
ability when applying formulas (7.140) and (7.142). This is again due to the fact that 
there are either no explicit or rather complicated representations of the renewal func- 
tion for most of the common lifetime distributions. These formulas can, however, be 
applied for obtaining approximate values for interval and point availability if they are 
used in conjunction with the bounds and approximations for the renewal function 
given in sections 7.3.2.2 and 7.3.3. 


Hence, in general, 
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7.3.7 Compound Renewal Processes 


7.3.7.1 Definition and Properties 


Compound stochastic processes arise by additive superposition of random variables 
at random time points. (For motivation, see section 7.2.5.) 


Definition 7.9 Let {(7)},M 1), ({2,Mz), ...}. be a random marked point process with 
property that {71,7 2,...} is the sequence of renewal time points of a renewal process 
{Y1, Y2,...$, and let {M(A, t=0} be the corresponding renewal counting process. 
Then the stochastic process {C(A), t= 0} defined by 


Ci) = | So M; if Mt)21 (7.146) 
0 if N(t) =0 


is called a compound (aggregate, cumulative) renewal process, and C(t) is called a 
compound random variable. e 


The compound Poisson process defined in section 7.2.5 is a compound renewal pro- 
cess with property that the renewal cycle lengths Y; =7;-7;), i=1,2,..., are inde- 
pendent and identically exponentially distributed (theorem 7.2). 

A compound renewal process is also called a renewal reward process, in particular, 
if M; is a'profit' of any kind made at the renewal time points. In most applications, 
however, M; is a 'loss', for instance, replacement cost, repair time, or claim size. But 
it also can represent a 'loss' or 'gain', which accumulates over the ith renewal cycle 
(maintenance cost, profit by operating the system). In any case, C(A) is the total loss 
(gain), which has accumulated over the interval (0,¢]. The sample paths of a com- 
pound renewal process are step functions. Jumps occur at times 7; and the respective 
jump heights are M; (Figure 7.13). 

In this section, compound renewal processes are considered under the following as- 
sumptions: 


A 
C(t) —= 
| 
C(Ts) =H) +M2+M3+M4+Ms -------- — 
| 
F —_ 
| | | 
C(T3) =hY, +M.+M3 eS —— | 
F — a a 
C(T1) =M, --———+ S 
4 1 1 t 
0 T T> T3 Ti. Ts. Ts 


Figure 7.13 Sample path of a compound process with positive increments 
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1) {M(A, t= 0} is a renewal counting process, which belongs to an ordinary renewal 
process {Y1, Yo,...}. 

2) The sequences {M,,Mo,...} and {¥,, Y2,...} are independent of each other and 
consist each of independent, nonnegative random variables, which are identically 
distributed as M and Y, respectively. M; and Y; are allowed to depend on each other 
ifi=/j, 1.e., if they refer to the same renewal cycle. 


3) The mean values of Y and M are finite and positive. 


Under these assumptions, Wald's equation (4.74) yields the trend function of the com- 
pound renewal process {C(f), t= 0}: 


m(t) = E(C()) = EM) A), (7.147) 
where H(t) = E(N(#)) is the renewal function, which belongs to the underlying renew- 
al process {Y1, Y,....}. Formula (7.147) and theorem 7.12, the elementary renewal 
theorem, imply an important asymptotic property of the trend function of compound 
renewal processes: 

. E(C(t)) _ E(M) 

Ni eee aR So 

Equation (7.148) means that the average long-run (stationary) loss or profit per unit 

time is equal to the average loss or profit per unit time within a renewal cycle. The 
‘stochastic analog' to (7.148) is: With probability 1, 
C(t) _ EM) 

pot EY) 

To verify (7.149), consider the obvious relationship 


De | Peete ep Saar 


(7.148) 


(7.149) 


From this, 


(1 yO \NO 2 CO f_1 No oe 
ae Liat Mi) t “(nat 2! y) 


Now the strong law of the large numbers (theorem 5.4) and (7.118) imply (7.149). 
The relationships (7.148) and (7.149) are called renewal reward theorems. 


Distribution of C() If M has distribution function G(d), then, given M(t) =n, the 
compound random variable C(A) has distribution function 


P(C() $x|NQ =n) =G"™@), 
where G*)(x) is the nth convolution power of G(¢). Hence, by the total probabil- 
ity rule, 
Fy) = PCCW) $x) = Liat GX) PIN = 0), (7.150) 
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where the probabilities P(N(4) =n) are given by (7.90). (With the terminology of 
section 2.4, F'c(p is a mixture of the probability distribution functions G*), G*2), .... 
If Y has an exponential distribution with parameter A, then C(¢) has distribution func- 
tion 


ee) n 
Fog) =e™ X G(x) eee ; GO@=l,x>0,t>0. (7.151) 
n=0 : 
If, in addition, M has a normal distribution with E(M) = 3 ,JVar(M) , then 


( aon 
Fayii=e| ie § 9 | x>0,¢>0. (7.152) 


The distribution function Fi), for being composed of convolution powers of G and 
F, is usually not tractable and useful for numerical applications. Hence, much effort 
has been put into constructing bounds on Fc and into establishing asymptotic ex- 
pansions. For surveys, see, e.g. Rolski et al. (1999) and Willmot, Lin (2001). The fol- 
lowing result of Gut (1990) is particularly useful. 


Theorem 7.20 If 


y? = Var {E(Y) M— E(M)Y } > 0, (7.153) 
then 
C)- Fe 
lim P} ————— <x|=9(), 
ro | EONS? y Jt 
where (x) is the distribution function of the standardized normal distribution. a 


This theorem implies that for large ¢ the compound variable C(t) has approximately a 
normal distribution with mean value and variance 


E(M) 25 25 
Eq)! and [E(Y)]~° y-4, 


respectively: 


C= MPs, (HON 371). (7.154) 


If M and Y are independent, then the parameter y? can be written in the following 
form: 


= [E()]? Var(M) + [E(M)]? Var(Y). (7.155) 


In this case, in view of assumption 3, condition (7.153) is always fulfilled. Condition 
(7.153) actually only excludes the case y* = 0, i.e. linear dependence between Y and 
M. The following examples present applications of theorem 7.20. 
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Example 7.20 For an alternating renewal process {(Y;,Z;); i= 1,2,...}, the total re- 
newal time in (0, ¢] is given by (a possible renewal time running at time ¢ is neglected) 


NO 
Ci) = diet Ze: 


where 


Nt) = max {n, Tn <t}. 


(Notation and assumptions as in section 7.3.6.) Hence, the development of the total 
renewal time is governed by a compound stochastic process. In order to investigate 
the asymptotic behaviour of C(t) as t—> © by means of theorem 7.20, M has to be 
replaced with Z and Y with Y+ Z. Consequently, if ¢ is sufficiently large, then C(A) 
has approximately a normal distribution with parameters 


E(X(t)) = EW)+ED' and Var(X(2) [E(Y) + E(Z)3 he 


Because of the independence of Y and Z, 
y? = Var[ZE(Y+ Z)-(Y+Z) EZ] 
= Var[ZE(Y) - YE(Q)] 
= [BO] Var(Z) + [EDI Var(¥) > 0 
so that assumption (7.153) is satisfied. In particular, let (all parameters in hours) 
E(Y) = 120, [Var(¥) =40, and E(Z)=4, |Var(Z) =2. 
Then, 
y? = 1207 -4+4+16-1600=83200 and y=288.4. 


Consider for example the total renewal time in the interval [0, 104 hours]. The prob- 
ability that C(10*) does not exceed a nominal value of 350 hours is 


P(CU104) <350) =0[ Sela aul )=oa.313) 
“ ~ \24-32.288.4./108) 


Hence, 
P(C(10*) < 350) = 0.905. oO 


Example 7.21 (normal approximation to risk processes) Let the sequence of the 
claim interarrival times Y,, Y,.... be an ordinary renewal process. This includes the 
homogeneous Poisson arrival process, to which section 7.2.7 is restricted. Otherwise, 
assumptions 2 to 4 (page 294 ) and the notation introduced there will be retained. 
Then, by theorem 7.20, if ¢ is sufficiently large compared to pp = E(Y), the total claim 
size arising in [0,¢] has approximately a normal distribution with mean value a and 


variance p> yt: 
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cin Mt pyr), (7.156) 
where 

2 2D. 2 

y~ =u Var(M) + v- Var(). 
The random profit G(f) the insurance company has made in [0, ¢] is given by 
G(t) = «t— C(t). 
By (7.156), G(#) has approximately a normal distribution with parameters 
E(G())=(«-f)t and Var(G() = w3y’t. 


Note that the situation considered here refers to the situation that, when being 'in red 
numbers' (ruin has happened), the company continues operating until it reaches a pro- 
fitable time period and so on. In case of a positive safety loading the company will 
leave 'loss periods’ with probability 1. 


As a numerical special case, let us consider a risk process {(Y 1, M1), (Y2,M2),...} with 
w= E(Y)=2 [A], Var(Y)=3 [h7], 
v = E(M) = 900 [$],_ Var(M) = 360000 [$7]. 
(1) What minimal premium per hour kg has the insurance company to take in so that 
it will achieve a profit of at least $10° within 10° hours with probability « = 0.95? 
Since y = 1967.2, 
P(G(104) = 10°) = P(C(t) < 104(« 9.95 — 100)) 
_ of oss = 100) = 450) 
2715 . 19.672 
Since the 0.95-percentile of the standardized normal distribution is zg 95 = 1.64, the 
desired premium per hour Ko .95 satisfies equation 
K 9.95 — 550 
6.955 


= 1.64. 


Hence, «9.95 = 561 [$/h]. 


This result does not take into account the fact that in reality the premium size has an 
influence on the claim flow. 


(2) Let the premium income of the company be « = 460 [$/A]. Thus, the company has 
a positive safety loading of o = 10[$]. Given an initial capital of x = 104 [$], what is 
the probability of the company to be in the state of ruin at time ¢ = 1000 [h]? 


This probability is given by 


~104 — (460 — 450) 102 ) 
2-15 . 1967.2 - [1000 


= ®(-0.910) =0.181. Oo 


P(G(10) <-104) =o) 


329 APPLIED PROBABILITY AND STOCHASTIC PROCESSES 


7.3.7.2 First Passage Time 
Example 7.21 motivates the investigation of the random time L(x), at which the com- 
pound renewal process {C(A), t= 0} (C(é) not necessarily a cost criterion) exceeds a 
given nominal value x for the first time: 
L(x) = inf {t, C(t) > x}. (7.157) 
t 


If, for instance, x is the critical wear limit of an item, then crossing level x is common- 
ly referred to as the occurrence of a drift failure. Hence, in this case it is justified to 
denote L as the lifetime of the system (Figure 7.14). 


A 
Ci) — 
I 
L(x)i< to 
C(to) > x 
“ ! = (to) 
ss bo 
1 | ! ! >t 
0 T| To L(xX)=T3 to Tq 


Figure 7.14 Level crossing of a compound stochastic process 


Since the M; are nonnegative random variables, the compound renewal process 
{C(t), t= 0} has nondecreasing sample paths. In such a case, the following relation- 
ship between the distribution function of the first passage time L(x) and the distribu- 
tion function of the compound random variable C(t)) is obvious (Figure 7.14): 


P(L(x) sf) = P(C() > x). (7.158) 
Specifically, if {M(‘),t20} is the homogeneous Poisson process, then, from for- 
mulas (7.151) and (7.158), 


ay 


P(L(x) >t) =e™ py G*M(x)“~_; 120, 


with x, x >0, fixed. The probability Ciintcn of L(x) is generally not explicitly 
available. Hence the following theorem (Gut (1990)) is important for applications, 
since it provides information on the asymptotic behavior of the distribution of L(x) as 
x — o, The analogy of this theorem to theorem 7.20 is obvious. 


Theorem 7.21 If y? = 1?Var(M) + v2Var(Y) > 0, then 


xX 
lim P| ——4) — <;|= 09, 
FA OM) So Be: 


where ®(f) is the distribution function of the standardized normal distribution. a 
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Actually, in view of our assumption that the compound process {C(A), t= 0} has non- 
decreasing sample paths, condition (7.158) implies that theorems 7.20 and 7.21 are 
equivalent. 


A consequence of theorem 7.21 is that, for large x, the first passage time L = L(x) has 
approximately a normal distribution with parameters 


E(L(x)) = aan and Var(L(x)) = [E(M)]3y2 


1) ~ M$ x, [E(M)] 37? x), x>0. (7.159) 


The probability distribution given by (7.159) is called Birnbaum-Saunders distribu- 
tion. 


Example 7.22 Mechanical wear of an item is caused by shocks. (For instance, for 
the brake discs of a car, every application of the brakes is a shock.) After the ith shock 
the degree of wear of the item increases by M; units. The M1, Mo,... are supposed 
to be independent random variables, which are identically normally distributed as M 
with parameters 


E(M) = 9.2 and [Var(M) =2.8 [in 10~4mm]. 


The initial degree of wear of the item is zero. The item is replaced by an equivalent 
new one if the total degree of wear exceeds a critical level of 0.1 mm. 


(1) What is the probability pjo9 that the item has to be replaced before or at the 
occurrence of the 100th shock? The degree of wear after 100 shocks is 


100 
Cio9 = Ligt Mj 


and has approximately the distribution function (unit of x: 10~4mm ) 


P(Cig <2) = of #282 0 : (25220) 


[2.82 - 100 28 


Thus, the item survives the first 100 shocks with probability 
P100 = P(Cj00 < 1000) = (2.86). 


Hence, pj09 = 0.979. 


(2) In addition to the parameters of M, the random cycle Y is assumed to have mean 
value and variance 


E(Y) =6 and Var(Y) =2 [hours]. 


What is the probability that the nominal value of 0.1 mm is not exceeded within the 
time interval [0, 600] (hours)? 
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To answer this question, theorem 7.21 can be applied since 0.1 mm is sufficiently 
large in comparison to the shock parameter E(M). Provided M and Y are independ- 
ent, the parameter y is y = 0.0024916. Hence, 


(9.2)? .2491.6- /0.1 
= 1 - @(-1.848). 
Thus, the desired probability is P(Z(0.1) > 600) = 0.967. O 


600— s 103 
P(L(0.1) > 600) = 1-® 


Example 7.23 Let the risk process {(Y,,M 1), (V2, M2), ...} have the parameters 
w= E(Y)=5 [A], Var(¥) =25 [h7], 
v = E(M) = 1000 [$],  Var(M) = 640000 [$7]. 


What is the probability that the total claim reaches level a = 10° [$] before or at time 
point ¢ = 5500 [A]? 


a) Since y = 6403, because of (7.159), 


_ 5-108 
9500 ~ “F000 


P(L(10® < 5500) < @| —————“*— 
A ) 1000-!-5 . 6403 - 10° 


= 0(2.4694) 


so that 
P(L(10° < 5500) = 0.993. 


b) Now the same question is answered by making use of (7.156) and (7.158): 
P(L(10°) < 5500) = P(C(5500) > 10°) 
= 1 — P(C(5500) < 10°) 


105 — 1000 - 5500 


5 
© 
515.6403 - ,/5500 


= 1-@(-2.354) 


so that 
P(L(10°) < 5500) = P(C(5500) > 10°) ~ 0.991. 


Taking into account the piecewise constant sample paths of the compound process 
{C(t), t= 0}, there is an excellent correspondence between the results obtained under 
a) and b). O 
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7.4 EXERCISES 


Sections 7.1 and 7.2 


7.1) The occurrence of catastrophic accidents at Sosal & Sons follows a homogene- 
ous Poisson process with intensity 1 = 3 a year. 


(1) What is the probability ps» that at least two catastrophic accidents will occur in 
the second half of the current year? 


(2) Determine the same probability given that two catastrophic accidents have occurr- 
ed in the first half of the current year. 


7.2) By making use of the independence and homogeneity of the increments of a 
homogeneous Poisson process with intensity 1, show that its covariance function is 
given by 

C(s,t) =A min(s, f). 


7.3) The number of cars which pass a certain intersection daily between 12:00 and 
14:00 follows a homogeneous Poisson process with intensity A = 40 per hour. Among 
these there are 2.2% which disregard the stop sign. The car drivers behave independ- 
ently with regard to ignoring stop signs. 


(1) What is the probability that at least two cars disregard the stop sign between 12:30 
and 13:30? 


(2) A car driver, who ignores the stop sign at this interection, causes an accident there 
with probability 0.05. What is the probability of one or more accidents at this inter- 
section between 12:30 and 13:30, caused by a driver, who ignores the stop sign? 


7.4) A Geiger counter is struck by radioactive particles according to a homogeneous 
Poisson process with intensity A = 1 per 12 seconds. On average, the Geiger counter 
only records 4 out of 5 particles. 


(1) What is the probability ps. that the Geiger counter records at least 2 particles a 
minute? 

(2) What are mean value and variance of the random time Y between the occurrence 
of two successively recorded particles? 


7.5) The location of trees in an even, rectangular forest stand of size 200m x 500m 
follows a homogeneous Poisson distribution with intensity X=1 per 25m?. The 
diameters of the stems of all trees at a distance of 130cm to the ground is assumed to 
be 24cm. From outside, a shot is vertically fired at a 500m side of the forest stand 
(parallel to the ground at level 130cm). What is the probability that a bullet with 
diameter |cm hits no tree? 

Hint With regard to the question, the location of a tree is fully determined by the coordinates 
of the center of the cross-section of its stem at level 130cm. 
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7.6) An electronic system is subject to two types of shocks, which occur independently 
of each other according to homogeneous Poisson processes with intensities 


AX, =0.002 and 47 = 0.01 per hour, 


respectively. A shock of type 1 always causes a system failure, whereas a shock of 
type 2 causes a system failure with probability 0.4. 


What is the probability that the system fails within 24 hours due to a shock? 


7.7) A system is subjected to shocks of types 1, 2, and 3, which are generated by 
independent homogeneous Poisson processes with respective intensities per hour 
A, =0.2, 42 = 0.3, and 43 =0.4. A type I-shock causes a system failure with pro- 
bability 1, a type 2-shock causes a system failure with probability 0.4, and shock of 
type 3 causes a system failure with probability 0.2. The shocks occur permanently, 
whether the system is operating or not. 


(1) On condition that three shocks arrive in the interval [0,10h], determine the 
probability that the system does not experience a failure in this interval. 


(2) What is the (unconditional) probability that the system fails in [0, 10h] due to a 
shock? 


7.8) Claims arrive at a branch of an insurance company according a homogeneous 
Poisson process with an intensity of 4 = 0.4 per working hour. The claim size Z has 
an exponential distribution so that 80% of the claim sizes are below $100 000, 
whereas 20% are equal or larger than $100 000. 


(1) What is the probability that the fourth claim does not arrive in the first two work- 
ing hours of a day? 
(2) What is the mean size of a claim? 


(3) Determine approximately the probability that the sum of the sizes of 10 consecu- 
tive claims exceeds $800 000. 


7.9) Consider two independent homogeneous Poisson processes | and 2 with respec- 
tive intensities 4; and A. Determine the mean value of the random number of events 
of process 2, which occur between any two successive events of process 1. 


7.10) Let {N(A), t= 0} be a homogeneous Poisson process with intensity A. 


Prove that for an arbitrary, but fixed, positive h the stochastic process (X(A), t = 0} 
defined by X() = M(t+ A) — NO is weakly stationary. 


7.11) Let a homogeneous Poisson process have intensity A, and let 7; be the time 
point at which the 7th Poisson event occurs. For t—> 0, determine and sketch the 
covariance function C(t) of the shot noise process {X(A), t= 0} given by 


a . 
xe h(t-T;) with p(s = sint for O<t<n 
0, elsewhere 
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7.12) Statistical evaluation of a large sample justifies to model the number of cars 

which arrive daily for petrol between 0:00 and 4:00 a.m. at a particular filling station 

by an inhomogeneous Poisson process {M(), t= 0} with intensity function 
Mt)=8-4t +322 [h-!], O<ts4. 

(1) How many cars arrive on average between 0:00 and 4:00 a.m.? 

(2) What is the probability that at least 40 cars arrive between 2:00 and 4:00? 


7.13) Let {N(A), t= 0} be an inhomogeneous Poisson process with intensity function 
Mt) =0.84+2t, ¢20. 
Determine the probability that at least 500 Poisson events occur in [20, 30]. 


7.14)* Let {N(A), t 20} be a nonhomogeneous Poisson process with trend function 
A(?) and arrival time point 7; of the ith Poisson event. 

Given M(t) =n, show that the random vector (71, 7>,..., 7n) has the same probability 
distribution as n ordered, independent and identically distributed random variables 
with distribution function 


AG) 
R= XO) for 0<x<t, 
1, t<x. 


Hint Compare to theorem 7.5 (page 268). 


7.15) Clients arrive at an insurance company according to a mixed Poisson process 
the structure parameter Z of which has a uniform distribution over the interval [0, 1]. 


(1) Determine the state probabilities of this process at time ¢. 
(2) Determine trend and variance function of this process. 


(3) For what values of a and B are trend and variance function of a Pélya arrival 
process identical to the ones obtained under (2)? 


7.16) A system is subjected to shocks of type | and type 2, which are generated by 
independent Pdlya processes {Nz ,(4),¢20} and {Nz,(t),¢20} with respective 


trend and variance functions 

E(N,,@) = t, Var(N,,(@) =t+ 0.50, 

E(N,,() = 0.5t, Var(Nz,() = 0.5t+ 0.125 1 
(time unit: hour). A shock of any type causes a system failure with probability 1. 
What is the probability that the system fails within 2 hours due to a shock? 


7.17)* Prove the multinomial criterion (formula 7.55, page 280). 
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7.18) An insurance company has a premium income of $106 080 per day. The claim 
sizes are iid random variables and have an exponential distribution with variance 
4-10°[$2]. On average, 2 claims arrive per hour according to a homogeneous Pois- 
son process. The time horizon is assumed to be infinite. 


(1) What probability distribution have the interarrival times between two neighboring 
claims? 
(2) Calculate the company's ruin probability if its initial capital is x = $20 000. 


(3) What minimal initial capital should the company have to make sure that its ruin 
probability does not exceed 0.01? 


7.19) Pramod is setting up an insurance policy for low-class cars (homogeneous 
portfolio) over an infinite time horizon. Based on previous statistical work, he expects 
that claims will arrive according to a homogeneous Poisson process with intensity 
= 0.8[h7!], and that the claim size will be iid distributed as an exponentially distri- 
buted random variable M with mean value v = E(M) = $3000. He reckons with a total 
premium income of $2800 [A~!]. 

(1) Given that these assumptions are correct, has Pramod a chance to be financially 
successful with this portfolio over an infinite period of time? 

(2) What is the minimal initial capital xg Pramod has to invest to make sure that the 
lower bound for the survival probability of this portfolio derived from the Lundberg 
inequality is 0.96? 

(3) For the sake of comparison, determine the exact value of the survival probability 
of this company for an initial capital of x9/3. 


7.20) The lifetime L of a system has a Weibull-distribution with distribution function 
F(t)=P(L< t)=1-e°%!", 420. 
(1) Determine its failure rate (4) and its integrated failure rate A(A). 


(2) The system is maintained according to Policy | (page 290, bottom) over an infinite 
time span. The cost of a minimal repair is cm = 40[$], and the cost of a preventive 
replacement is cp = 2000 [S$]. 


Determine the cost-optimum replacement interval t* and the corresponding minimal 
maintenance cost rate Kj (t*). 


7.21) A system is maintained according to Policy 3 (page 292, top) over an infinite 
time span. It has the same lifetime distribution and minimal repair cost parameter as 
in exercise 7.20. As with exercise 7.20, let c; = 2000. 


(1) Determine the optimum integer n =n*, and the corresponding maintenance cost 
rate K3(n*),. 


(2) Compare K3(n*) to K,(t*) (exercise 7.20) and try to intuitively explain the result. 
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Sections 7.3 and 7.4 

Note Exercises 7.22 to 7.31 refer to ordinary renewal processes. The functions f(t) and F(t) 
denote density and distribution function; the parameters . and 2 are mean value and second 
moment of the cycle length Y. M(#) is the (random) renewal counting function, and H(t) 
denotes the corresponding renewal function. 


7.22) A system starts working at time t= 0. Its lifetime has approximately a normal 
distribution with mean value p= 125 hours and standard deviation o = 40 hours. 
After a failure, the system is replaced with an equivalent new one in negligible time, 
and it immediately takes up its work. All system lifetimes are independent. 

(1) What is the minimal number of systems, which must be available, in order to be 
able to maintain the replacement process over an interval of length 500 hours with 
probability 0.99? 


(2) Solve the same problem on condition that the system lifetime has an exponential 
distribution with mean value p = 125. 


7.23) (1) Use the Laplace transformation to find the renewal function H(f) of an ordi- 
nary renewal process whose cycle lengths have an Erlang distribution with param- 
etersn=2 andi. 

(2) For A= 1, sketch the exact graph of the renewal function and the bounds (7.117) 
in the interval 0 < t< 6. Make sure the bounds (7.117) are applicable. 


7.24) An ordinary renewal function has the renewal function H(t) = /10. Determine 
the probability P(N(10) = 2). 


7.25) A system is preventively replaced by an identical new one at time points T, 2T,... 
If failures happen in between, then the failed system is replaced by an identical new 
one as well. The latter replacement actions are called emergency replacements. This 
replacement policy is called block replacement. The costs for preventive and emer- 
gency replacements are cp and ce, 0 < cp < Ce, respectively. The lifetime L of a sys- 
tem is assumed to have distribution function 

F(t)=P(L <0) =(1-e™)?, t20. 


(1) Determine the renewal function H(t) of the ordinary renewal process with cycle 
length distribution function F(#). 


(2) Based on the renewal reward theorem (7.148), give a formula for the long-run 
maintenance cost rate K(t) under the block replacement policy. 


(3) Determine an optimal t = t* with regard to K(t) for A= 0.1, ce = 180, cp = 100. 


(4) Under otherwise the same assumptions, determine the cost rate if the system is 
only replaced after failures and compare it with the one obtained under (3). 
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7.26) Given the existence of the first three moments of the cycle length Y of an ordi- 
nary renewal process, verify the formulas (7.112). 


7.27) (1) Verify that the probability p(t) = P(N(t) is odd) satisfies 
Pi) =FO)—Jyplt-f@dx, fe) =F'@). 


(2) Determine this probability if the cycle lengths are exponential with parameter 2. 


7.28)* Verify that the second moment of N(1), denoted as H(t) = E(N?(0)), satisfies 
the integral equation 
H(t) =2H() —F() + J Ho(t-) fla) de. 


Hint Verify the equation directly or by applying the Laplace transformation. 


7.29) The times between the arrivals of successive particles at a counter generate an 
ordinary renewal process. Its random cycle length Y has distribution function F(¢) and 
mean value p = E(Y). After having recorded 10 particles, the counter is blocked for t 
time units. Particles arriving during a blocked period are not registered. 


What is the distribution function of the time from the end of a blocked period to the 
arrival of the first particle after this period if t > «0? 


7.30) The cycle length distribution of an ordinary renewal process is given by the dis- 
tribution function F(t) = 1 - ev, t= 0 (Rayleigh distribution). 


(1) What is the statement of theorem 7.13 if g(x) =(x+1)~?, x20? 
(2) What is the statement of theorem 7.15? 


7.31) Let be A(f) the forward and B(f) the backward recurrence times of an ordinary 
renewal process at time ¢. For x >y/2, determine functional relationships between 
F(t) and the conditional probabilities 


(1) P(A > y-t|B() =t-x), OS x<t<y, 
(2) P(A(t) < y|B() =x), OS x<t, y>0. 


7.32) Let (Y,Z) be the typical cycle of an alternating renewal process, where Y and Z 
have an Erlang distribution with joint parameter 2 and parameters n =2 and n= 1, 
respectively. For t—> 0, determine the probability that the system is in state | at 
time ¢ and that it stays in this state over the entire interval [t,t+x],x>0 (process 
states as introduced in section 7.3.6). 


7.33) The time intervals between successive repairs of a system generate an ordinary 
renewal process {Y 1, Y2,...} with typical cycle length Y. The costs of repairs are 
mutually independent and independent of {Y,, Yo,...}. 
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Let M be the typical repair cost and 
u=E(Y) = 180[days] and o= J Var(Y) = 30, 
v = E(M) = 200[$] and /Var(M) = 40. 
Determine approximately the probabilities that 


(1) the total repair costs arising in [0, 3600 days] do not exceed $4500, and 
(2) a total repair cost of $3000 is not exceeded before 2200 days. 


7.34) (1) Determine the ruin probability p(x) of an insurance company with an initial 
capital of x = $ 20000 and operating parameters 

1/1 =2[h7!], v=$800 and « = 1700[$/A]. 
(2) Under otherwise the same conditions, draw the the graphs of the ruin probability 
for x = 20000 and x = 0 in dependence on k over the interval 1600 < « < 1800. 


(3) With the numerical parameters given under (1), determine the upper bound e 
for p(x) given by the Lundberg inequality (7.85). 


1x 


(4) Under otherwise the same conditions, draw the graph of e-* with x = 20000 in 
dependence on « over the interval 1600 < « < 1800 and compare to the correspond- 
ing graph obtained under (2). 


Note For problems (1) to (4), the model assumptions made in example 7.10 apply. 


7.35) Under otherwise the same assumptions as made in example 7.10, determine the 
ruin probability if the random claim size M has density 


b(y) =A2ye*, A>0, yO. 


This is an Erlang-distribution with parameters A and n = 2. 


7.36) Claims arrive at an insurance company according to an ordinary renewal pro- 
cess {Y1,Y,...}. The corresponding claim sizes M,,Mp,... are independent and 
identically distributed as M and independent of {Y 1, Y,...}. Let the Y; be distributed 


as Y; i.e., Y is the typical interarrival interval. Then (Y, M) is the typical interarrival 
cycle. From historical observations it is known that 


u=E(Y)=1 [A], Var(Y) =0.25, v= EM) = $800, Var(M) = 250.000. 
Find approximate answers to the following problems: 
(1) What minimum premium per unit time Kino, has the insurance company to take 


in so that it will make a profit of at least $10° within 20 000 hours with probability 
a= 0.99? 


at 1s the probability that the total claim reaches leve within ! 
(2) What is the probability that the total clai hes level $10° within 135? 


Note Before possibly reaching its goals, the insurance company may have experienced one or 
more ruins with subsequent 'red number periods’. 


CHAPTER 8 


Discrete-Time Markov Chains 


8.1 FOUNDATIONS AND EXAMPLES 


This chapter is subjected to discrete-time stochastic processes {Xq,X1,...} with dis- 
crete state space Z which have the Markov property. That is, on condition Xn = xn 
the random variable X,,,, is independent of all X0,X1,...,X,-1. However, without 
this condition, X,,,; may very well depend on all the other X;, i <n. 


Definition 8.1 Let {X0,X1,...} be a stochastic process in discrete time with discrete 
state space Z. Then {X9,X}1,...} is a discrete-time Markov chain if for all vectors 
XQ,X],--5Xy41 With x, € Z and for all n= 1,2,..., 


PXn+1 =Xn4i|Xn =Xnj 4X] =X1,X0 =X0) = PX =Xn411Xn = Xn). (8.1) 
@ 


Condition (8.1) is called the Markov property. It can be interpreted as follows: If time 
time point f = 7 is the present, then ¢= 7+ 1 is a time point in the future, and the time 
points t=n-—1,..., 1, 0 are in the past. Thus, 


The future development of a discrete-time Markov chain depends only on its 
present state, but not on its evolution in the past. 


For the special class of stochastic processes considered in this chapter, definition 8.1 
is equivalent to the definition of the Markov property via (6.23) at page 233. It usual- 
ly requires much effort to check by statistical methods, whether a particular stochast- 
ic process has the Markov property (8.1). Hence one should first try to confirm or to 
reject this hypothesis by considering properties of the underlying technical, physical, 
economical, or other practical background. For instance, the final profit of a gambler 
usually depends on his present profit, but not on the way the gambler has obtained it. 
If it is known that at the end of the » th month a manufacturer has sold a total of 
Xn =Xn personal computers, then for predicting the total number of computers X;,41 

sold a month later knowledge about the number of computers sold within the first n 
months will make no difference. A car driver checks the tread depth of his tires after 
every 5000 km. For predicting the tread depth after a further 5000 Am, the driver will 
only need the present tread depth, not how the tread depth has evolved to its present 
level. For predicting, however, the future concentration of noxious substances in the 
air, it has been proved necessary to take into account not only the present value of 
the concentration, but also the past development leading to this value. In this chapter 
it will be assumed that the state space of the Markov chain is Z= {0, +1, +2,...} or 
a subset of it. Generally, states will be denoted as i, j,k,.... 
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Transition Probabilities The conditional probabilities 
Pi) =PXny1 =/|Xn =i); n=0,1,... 
are the one-step transition probabilities of the Markov chain. A Markov chain is 


called homogeneous if it has homogeneous increments. Thus, a Markov chain is 
homogeneous if and only if its one-step transition probabilities do not depend on n: 


pif) =Pij forall n= 0, 1, eens 


Note This chapter only deals with homogeneous Markov chains. For the sake of brevity, the 
attribute homogeneous is generally omitted. 


The one-step transition probabilities are combined in the matrix of the one-step tran- 
sition probabilities (shortly: transition matrix) P: 


( P00 POl P02 

P10 Pll P12 
Poi.  3 

Pi0 Pil Pi2 


pi; 1s the probability of a transition from state i to state j in one step (or, equival- 
ently, in one time unit, in one jump). With probability p;; the Markov chain remains 
in state i for another time unit. The one-step transition probabilities have some obvi- 
ous properties: 


py29, XL py=l; ie Z. (8.2) 
JEZ 
The m-step transition probabilities of a Markov chain are defined as 
Pip =PXnim =jlXn=); m=1,2,... (8.3) 
Thus, pw is the probability that the Markov chain, starting from state i, will be aft- 


er m steps in state 7. However, in between the Markov chain may already have arriv- 
‘ 1 
ed at state j. Note that Pij =p, 


It is convenient to introduce the notation 


0. _ 1 if i=y, 
Pij -5y=| 0 if i4j. oe 


8;; defined in this way is called the Kronecker symbol. 


The following relationship between the multi-step transition probabilities of a dis- 
crete-time Markov chain is called the 
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Chapman-Kolmogorovy equations: 


(m) _ > (r) (mr), r=0.1 


- Pik Pj > wy MN. (8.5) 


The proof is easy: Conditioning with regard to the state, which the Markov chain 
assumes after 7 steps, 0 <r<m, and making use of the Markov property yields 


Py =PXm=j|Xo=0= & Pm =), Xr = kl Xo =i) 
E 


iS 
keZ 


keZ Pik Pry 


This proves formula (8.5). 


It simplifies notation, when introducing the matrix of the m-step transition probabil- 
ities of the Markov chain: 


p@™) = ((0%)): m=0,1,.... 


Then the Chapman-Kolmogorov equations can be written in the elegant form 
Pp) = pO pr). -=1,2,..,m. 
This relationship implies that 
pen =p, 


Thus, the matrix of the m-step transition probabilities is equal to the m-fold product 
of the matrix of the one-step transition probabilities. 


A probability distribution p of Xo is said to be an initial distribution of the Mar- 
kov chain: 


p = po) =P(Xo = i), ie€ Z, > pe = 1 ; (8.6) 
ieZ 


A Markov chain is completely characterized by its transition matrix P and an initial 
distribution p. In order to prove this one has to show that, given P and p), all its 
finite-dimensional probabilities can be determined: By the Markov property, for any 
finite set of states i9,i),...,in, 


P=. Si Sa) 
= P(Xn = in|Xo = ig, X41 = 11, --sXq-1 = in_1) P(X = 10. X1 = 1,» Xn-1 =in-1) 
= P(Xn = inl Xp_1 = in-1) + P(X0 = i0,X1 = i155 Xn-1 = int) 
= Dippin * P(X0 = 10.X1 = A155 Xn-1 = in-1)- 
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The second factor in the last line is now treated in the same way. Continuing in this 
way yields 


; 0 
P(X =i0,X1 =11,.--.Xn = in) =p? ‘Digi, *Piyiy °° Pintine (8.7) 


This proves the assertion. The absolute or one-dimensional state probabilities of the 
Markov chain after m steps are denoted as 


p= POneps FEE. 
The set {p ue Jé z\ is the oie probability distribution of the Markov chain 
after m steps, m=0,1,.... Given an initial distribution pO= = { a ie Z}, by the 
total probability rule, 
pes x Ph po, m=1,2,... (8.8) 


Definition 8.2 An initial distribution {2; = P(Xo =i); i € Z} is called stationary if 
it satisfies the system of linear equations 


T= jen Ti Pigs FEZ, (8.9) 
1 = Diez Tj. (8.10) 
e 


It can be shown by induction that, starting with a stationary initial distribution, the 
absolute state distributions of the Markov chain for any number m of steps coincide 
with the stationary initial distribution, i.e., for all 7 € Z, 

p™ = - 

Pj ree Tj Pj; =Tj, m=1,2,... (8.11) 


In this case, the Markov chain is said to be in a (global) state of equilibrium, and the 
probabilities 1; are also called equilibrium state probabilities of the Markov chain. 


If a stationary initial distribution exists, then the structure (8.7) of the n-dimensional 
state probabilities of the Markov chain verifies theorem 6.1: 


A homogeneous Markov chain is strictly stationary if and only if its 
one-dimensional) absolute state probabilities do not depend on time. 


Markov chains in discrete time virtually occur in all fields of science, engineering, 
operations research, economics, risk analysis, and finance. In what follows, this will 
be illustrated by some examples. 


Example 8.1 (unbounded symmetric random walk) A particle moves along the real 
axis in one step from an integer-valued coordinate i either to i+ 1 or to i-1 with 
equal probabilities. The steps occur independently of each other. If Xo is the start 
position of the particle and X;, its position after n steps, then {X0,X1,...} is a dis- 


8 DISCRETE-TIME MARKOV CHAINS 343 


crete-time Markov chain with state space Z= {0,+1,+2,---} and one-step transition 
probabilities 
_ {1/2 for j=i+lorj=i-1 
Pii~\0 otherwise © 


It is quite intuitive that the unbounded symmetric random walk cannot have a station- 
ary initial distribution. An exact argument will be given later. O 


Example 8.2 (random walk with reflecting barriers—Ehrenfest's diffusion model) 
For a given positive integer z, the state space of a Markov chain is Z= {0, 1,---,2z}. 
A particle moves from position 7 to position j in one step with probability 


2z-i ee 
“2 for jJuit 1, 

Py=) = for j=i-1, (8.12) 
0 otherwise. 


Thus, the greater the distance of the particle from the central point z of Z, the greater 
the probability that the particle moves in the next step into the direction of the central 
point. Once the particle has arrived at one of the end points x=0 or x =2z, it will 
return in the next step with probability | to position x = 1 or x =2z-—1, respectively. 
(Hence the terminology reflecting barriers.) If the particle is at x =z, then the prob- 
abilities of moving to the left or to the right in the next step are equal, namely 1/2. In 
this sense, the particle is at x=z in an equilibrium state. This situation may be 
thought of as caused by a force, which is situated at the central point. Its attraction to 
a particle linearly increases with the particle's distance from this point. 


A stationary state distribution exists and satisfies the corresponding system of linear 
equations (8.9): 
lo =T™1P10> 
Tm =U -iP-lyttaiPAlyp J=1,2,..,2z-1, 
Nz = 22-1 P2z-1,2z- 
The solution, taking into account the normalizing condition (8.10), is 


2 = . 
nj=(2#)2 288 JO Meise 


As expected, state z has the greatest stationary probability. 


P. and T. Ehrenfest (1907) came across this random walk with reflecting barriers 
when investigating the following diffusion model: In a closed container there are 
exactly 2z molecules of a particular type. The container is separated into two equal 
parts by a membrane, which is permeable to these molecules. Let X;, be the random 
number of molecules in one part of the container after 7 transitions of any molecule 
from one part of the container to the other one. If Xq denotes the initial number of 
molecules in the specified part of the container, then they observed that the random 
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sequence {X0,X1,...} behaves approximately as a Markov chain with transition pro- 
babilities (8.12). Hence, the more molecules are in one part of the container, the more 
they want to move into the other part. In other words, the system tends to the equilib- 
rium state, i.e. to equal numbers of particles in each part of the container. O 


Example 8.3 (random walk with two absorbing barriers) The movement of a particle 
within the state space Z = {0,1,---,z}, z>1, is controlled by a discrete-time Markov 
chain {X0,X1,...} with transition probabilities 


p for j=i+l, lsisz-l, 
Pij= q for j=i-1, 1<i<z-l, 
0 otherwise. 
Hence, x =0 and x =z are absorbing states (‘barriers’), i.e., if the particle arrives at 


state 0 or at state z, it cannot leave these states anymore: poo = 1, pzz = 1. The matrix 
of the one-step transition probabilities is 


10000: 0 
q0po00 =: 0 
0g O0pod0-:: 0 
P=| 00 q 0p 0 =: 
00040 p oO 
00000 0 #1 


This random walk cannot have a stationary initial distribution, since given any initial 
distribution the Markov chain will arrive at an absorbing barrier with probability 1 in 
finite time. 


Absorption It is an interesting and important exercise to determine the probabilities 
of absorption of the particle atx = 0 and x =z, respectively. Let a(n) be the probabil- 
ity of absorption at x = 0 if the particle starts moving from x =n, 0 <n <z. On condi- 
tion that the particle moves from x to the right, its absorption probability at x =0 is 
a(n+1) ifn+1<z. On condition that the particle moves from n to the left, the ab- 
sorption probability at x =0 is a(n—1) ifn—120. Hence, in view of the formula of 
total probability (1.24), a(n) satisfies the system of linear equations 


a(n) =p: a(n+1)+q-a(n—-1); n=1,2,---,z-1. (8.13) 
The boundary conditions are 
a(0)=1, a(z)=0. (8.14) 
Replacing a(n) in (8.13) with pa(n)+qa(n) yields the following algebraic system of 
equations for the a(n) : 


[a(n) —a(n+ 1)] = 7 [a(n—1)—a(n)], n= 1,2, 42-1. (8.15) 
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Starting with n = 1, repeated application of (8.15) yields 


a(0) — a(1) = [1—a(1)] 
a(1) — a(2) = (q/p) [1 —a()] 
a(2) —a(3) = (gip)? [1 -a(1)] 
: (8.16) 
a(z— 1) —a(z) = (q/p)*" [1 -a(1)]. 
By taking into account the boundary conditions (8.14), 
Ln=ila(n — 1) —a(n)] 
=[1-a(1)]+[a(1) -a(2)]+--- + [az -2)-az-1)]+[az-1)-0]=1. 8.17) 


Using the finite geometrical series (2.18) at page 48, equations (8.16) yield for p# q 


1-(qipy’ _, 
l-qp 


Elan 1)—a(m)] = [1 - aC] (gipy"! = =a] 


Solving this equation for a(1) gives 
(q/p) —a/p 
a1) = ~————.. 
(q/p) - 1 
Starting with a(0) = 1 and a(1), the systems of equations (8.16) or (8.13), respective- 
ively, provide the complete set of absorption probabilities at state 0: 
(q/p)” — (qip)" 

a) = aa 
(q/p) -1 
If p=q = 1/2, equations (8.16) showthat all the differences a(n — 1) -— a(n) are equal 
to 1—a(1). Hence, equation (8.17) implies 


a(n)=1-4 = 250 n=0,1,...2z, p= 1/2. 


n=1,2,...,2, D#q. (8.18) 


The absorption probabilities b(n) of the particle at state z, when starting from state n, 
are given by 


b(n) =1-a(n), n=0,1,2,...,z. 


Time till absorption Let m(n) be the mean time till the particle reaches one of the 
absorbing states 0 or z, when starting from state n, 1 <n <z—1. Ifthe first jump goes 
from the starting point 7 to the right, then the mean time till absorption is 1 + m(n + 1). 
When the first jump goes to the left, then the meantime till absorption is 1 + m(n—- 1). 
Hence, the m(n) satisfy the system of equations 


mn) =p[l+mn+1)]+q]+mn-1)]; n=1,2,...,2z-1, (8.19) 
with the boundary conditions 
m(0) = m(z) = 0. 
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(8.19) is equivalent to 
[m(n) —m(n+1)] = 5 [m(n-1)-m(n)], n=1,2,...,2-1. 


Since this system is formally identical to (8.15), it can be solved analogously. Taking 
into account the boundary conditions m(0) = m(z) = 0, its solution is 


1 1-(@p)" : 
mon) = pha] ) n| if p#q, (8.20) 


m(n)=n(z—-n) if p=g=1/2 


for n = 1,2,...,z—1. Table 8.1 shows some numerical results. In particular for large z, 
even small changes in p have a significant impact on the absorption probabilities. 


Table 8.1 Probabilities and mean times to absorption for example 8.3 


Gambler's ruin: The random walk with two absorbing barriers has a famous inter- 
pretation: A gambler has an initial capital of $n. After each game his capital has 
increased by $1 with probability p (win) or decreased by $1 (loss) with probability q. 
The gambler has decided to stop gambling when having lost the initial capital or 
when having reached a total capital of $z,0<n<z. When following this strategy, 
the gambler will lose all of his initial capital with probability a(n) given by (8.18) or 
will walk away with a total capital of z with probability b(n) = 1 — a(n). O 


Example 8.4 (electron orbits) Depending on its energy, an electron circles around 
the atomic nucleus in one of the countably infinite sets of trajectories {1,2,...}. The 
one-step transition from trajectory i to trajectory 7 occurs with probability 


Pij= aje Oli, b>0O. 
Hence, the two-step transition probabilities are 


(2) 


= 2 —b(\i-k|+|k7 
Pij = djL fel are (li |+| il), 


The a; cannot be chosen arbitrarily. In view of (8.2), they must satisfy the condition 


a; (ee) + eP(2) e+e?) +ajdng ee * =1, 
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or, equivalently, 


l-e l-e 
Therefore, 
b 
i | ae: 
Ge hE gS SDF se. 
The structure of the p;; implies that a; =p;; for alli =1,2,.... oO 


Example 8.5 (dynamics of traffic accidents) Let X, denote the number of traffic 
accidents over a period of n weeks in a particular area, and let Y; be the correspond- 
ing number in the ith week. Then, X, = sy Y;. 


The Y; are assumed to be independent and identically distributed as a random varia- 
ble Y with probability distribution {g, =P(Y=4); k=0,1,...}. Then {X1,Xo,...} is 
a Markov chain with state space Z = {0, 1,...} and transition probabilities 


fe APG B06 Ie 
sta Oo 
Hy | 0 otherwise. 


Example 8.6 (reproduction of diploid cells) Chromosomes determine the hereditary 
features of higher organisms. Essentially they consist of strings of genes. The position 
of a gene within a chromosome is called its Jocus. The different types of genes, which 
can be found at a locus, are called alleles. The chromosomes of mammals occur in 
pairs (two strings of chromosomes 'in parallel’). For example, mammals have these 
diploid chromosomes. If, in the diploid case, the possible alleles are g and G, then at 
a locus the combinations (g,g), (g, G), or (G,G) are possible. Such a combination is 
called a genotype. Note that (g,G)=(G,g). 

Consider a one-sex population with an infinite (very large) number of individuals. 
All of them have genotype (g,g), (g, G), or (G,G). Each individual is equally likely to 
pair with any other member of the population, and, when pairing, each individual ran- 
domly gives one of its alleles to its offspring. Genotypes (g,g) and (G,G) can only 
contribute g or G, respectively, whereas (g,G) contributes g or G with probability 1/2 
each to the offspring. 


Let ag, Bo, andyg with ag + By +o = 1 be the probabilities that an individual, ran- 
domly selected from the first generation, belongs to genotype (g,g), (g, G), or (G,G), 
respectively. By the formula of the total probability, a randomly chosen allele from 
the first generation is of type g with probability 


Pi(g) = Pi(gleg) ao + Pi(glgG) Bo +P 1 (glGG) yo = ao + Bo/2, 
since P\(g|gg) = 1, P1(g|gG) = 1/2, and P\(g|GG) = 0. 
By changing the roles of g and G, 
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P\(G) =P (GIGG) yo + Pi (G|gG) Bo + P1(Glgg) 40 = 0 + Bo/2. 
Hence, a randomly selected individual of the second generation has genotype (g,g), 
(g,G), or (G,G) with respective probabilities a, B, and y given by 
a. = (a9 + Bo/2)*, 
B =2 (a0 + Bo/2) (Yo + Bo/2), (8.21) 
Y= (Yo + Bo/2)?, 
since a+ B+y=1. Thus, the respective probabilities that a randomly from the sec- 
ond generation chosen allele is of type g or G are 


Po(g) = 0. + B/2 = (a9 + Bo/2)? + (0 + Bo/2) (Yo + Bo/2) = Ho + Bo/2 = P(g), 
P(G) =+ B/2 = (yo + Bo/2)? + (a0 + Bo/2) (Yo + Bo/2) = Yo + Bo/2 = P1(G). 


Corollary Under the assumption of random mating, the respective percentages of the 
population belonging to genotype (g,g), (g, G), or (G,G) stay at levels a [100%], 
B [100%], and y [100%] in all successive generations. 


In the literature on population genetics, this result is known as the Hardy-Weinberg 
law; see Hardy (1908). A relationship between this law and discrete-time Markov 
chains is readily established: Let X> be the genotype of a randomly from the second 
generation chosen individual, and X3, X4,... be the genotypes of its offspring in the 
following generations. Then the state space of the Markov chain {X2,X3,...} 1s 


Z= {2 = 88, 22 =gG, 23 = GG} 
with the absolute state probabilities 
a= P(X; =21), B=P(Xj=22), Y= P(X; =23), 1=2,3,.... 


The one-step transition probabilities p;;, i,j = 1,2,3, are determined by conditioning 
with regard to the genotype M of the randomly selected mate, e.g.: 
Pir =(Pi|M=z1)- PM =21) + (pi |M =22)- P(M =22)+ @i1|M=z3)- P(M =23) 
=1-a+ 8/2+0-y=a+6/2. 


Piz = (P12|M =21) -P(M =z) + (p12|M =22)- P(M =2z2) + (p12|M =23) - P(M =z3) 
=0+ B/2+y=y+6/2. 


PB=l-pi-pi2=1-a-B/2-y-B/2 =0. 
P21 = (p21|M = 21) -P(M =z1)+ (p21 |M = 22) -P(M =z2)+ (p21 |M = 23) - P(M = 23) 
=0/2+B/44+0-y=a/2 + B/4. 


p22 = 0/2 +B/2 +y/2 = 1/2 (sincea+ Bt+y=1). 


P23 1 P21 -P22 = 1-a/2 B/4 1/2= B/4 t y/2. 
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The complete one-step transition matrix of the Markov chain {X, X3,...} is 


a+P/2 y+p/2 0 
o/2+B/4 1/2 p/4+y/2 (8.22) 
0 a+p/2 y+B/2 


In view of its property to generate the same absolute state probabilities in all genera- 
tions following the first one, 


N= {1 =A, T2 =f, 13 =y} 


is a Stationary initial distribution of the homogeneous Markov chain {X, X3,...}. This 
can be verified by showing that x satisfies the system of linear equations (8.9) if the 
transition probabilities p;; are given by the matrix (8.22) (exercise 8.8). O 


Example 8.7 (sequence of moving averages) Let {Y;; i=0,1,...} be a sequence of 
independent, identically distributed binary random variables with 
PY; =1) =P; =-) =172. 


Moving averages Xy are defined as follows (see also page 240): 
Xn =F(¥n+Yni)3 n= 1,2)... 
Xn has range {-1, 0, +1} and probability distribution 
1 1 1 
[P(Xn=-l)=4, PXn=0)=3, PXn=+1)= 4}. 


Since X, and Xy+m are independent for m > 1, the corresponding matrix of the m-step 


transition probabilities p™ =P(Xnim =j|Xn =i) is 


ah 
f.. 3 44 
-1 1/4 1/2 1/4 
p™ = 0 1/4 1/2 1/4 
+1 4 1/2 1/4 


The matrix of the one-step transition probabilities p;; = P(Xn+1 =/ |X, =i) is 


1/2 1/2 0 
pO) = p= 1/4 1/2 1/4 
O. (i 40 


Since 
PY .p) x p@), 


the Chapman-Kolmogorov equations do not hold. Therefore, the sequence of moving 
averages {X,,X>,...} cannot be a Markov chain. oO 
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8.2.1 Closed Sets of States 


A subset C of the state space Z of a Markov chain is said to be closed if 
Di pi=!1 forall ie C. (8.23) 
jec 


If a Markov chain is in a closed set of states, then it cannot leave this set since (8.23) 
is equivalent to p;; =0 for all i ¢ C, j ¢ C. Furthermore, (8.23) implies that 


(m) 


Pij =0 forall ie C, 7 ¢é Cand m= 1. (8.24) 
For m = 2 formula (8.24) can be proved as follows: From (8.5), 
2 
De = LX PikPkj+ X PikPKj =9; 
keC keC 


since j ¢ C implies pz; =0 in the first sum and p;,=0 in the second sum. Now 
formula (8.24) follows inductively from the Chapman-Kolmogorov equations. 


A closed set of states is called minimal if it does not contain a proper closed subset. 
In particular, a Markov chain is said to be irreducible if its state space Z is minimal. 
Otherwise the Markov chain is reducible. 


A state i is said to be absorbing if p;; = 1. Thus, if a Markov chain has arrived in at 
absorbing state, it cannot leave this state anymore. Hence, an absorbing state is a 
minimal closed set of states. Absorbing barriers of a random walk (example 8.3) are 
absorbing states. 


Example 8.8 Let Z= {1,2,3,4,5} be the state space of a Markov chain with tran- 
sition matrix 


( 02 0 05 03 0 \ 
01 0 09 0 0 
P=| 0 1 0 0 0 
04 0.1 02 0 03 
O. Ox 0 Or. 4 


It is helpful to illustrate the possible transitions between the states of a Markov chain 
by transition graphs. The nodes of these graphs represent the states of the Markov 
chain. A directed edge from node i to node j exists if and only if p;; > 0, that is if a 
one-step transition from state i to state j is possible. The corresponding one-step tran- 
sition probabilities are attached to the edges. Figure 8.1 shows that {1,2,3,4} is not 
a closed set of states since condition (8.24) is not fulfilled for i= 4. State 5 is absorb- 
ing so that {5} is a minimal closed set of states. This Markov chain is, therefore, 
reducible. O 
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0.9 


Figure 8.1 Transition graph in example 8.8 


8.2.2 Equivalence Classes 


State 7 is said to be accessible from state i (symbolically: i= /) if there exists an 
m = | such that rat >0. The relation '>"' is transitive: 


(m) (n) 


Ifi=k andk =, there exist m>0 andn > 0 with p;, > 0 and pj; > 0. Hence, 


Pip = & Pir Pry = Pin Pay > O 
reZ 


Consequently, i= and k > / imply i= /, 1.-e., the transitivity of '>.' 


The set M(i) = {k, i= k} consisting of all those states which are accessible from i is 
closed. To prove this assertion it is to show that k « M(z) and j ¢ M(i) implyk + /. 
The proof is carried out indirectly: If under the assumptions stated k > /, then i => k 
and the transitivity would imply i >. But this contradicts the definition of M(i). 


If both i>) and j= i hold, then i andj are said to communicate (symbolically: 
i<j). Communication '>' is an equivalence relation since it satisfies the three 
characteristic properties: 


() isi. reflexivity 
(2) If ij, then j Si. commutativity 
(3) Ifiejandj ok, theniok. associativity 


Properties (1) and (2) are an immediate consequence of the definition of ” <>”. To 
verify property (3), note that i<j and j <= imply the existence of m and n so that 


” > 0 and Pe > 0, respectively. Hence, by (8.5), 


por”) > pp > p™ p® >0. 


Likewise, there exist M@ and N with 
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pat » 909 0, 9 


so that the associativity is proved. 


The equivalence relation '<>' partitions state space Z into disjoint, but not necessar- 
ily closed classes in the following way: Two states i and j belong to the same class if 
and only if they communicate. In what follows, the class containing state 7 is denoted 
as C(i). Clearly, any state in a class can be used to characterize this class. All proper- 
ties of states introduced in what follows will be class properties, i.e. if state i has one 
of these properties, all states in C(i) have this property as well. 


A state 7 is called essential if any state j which is accessible from i has the property 
that 7 is also accessible from /. In this case, C(Z) is called an essential class. 


A state 7 is called inessential if it is not essential. In this case, C(Z) is called an ines- 
sential class. If i is inessential, then there exists a state 7 for whichi=>j and j +i. 


It is easily verified that essential and inessential are indeed class properties. In ex- 
ample 8.8, the states 1, 2, 3 and 4 are inessential since state 5 is accessible from each 
of these states but none of the states 1, 2, 3 or 4 is accessible from state 5. 


Theorem 8.1 (1) Essential classes are minimal closed classes. (2) Inessential classes 
are not closed. 

Proof (1) The assertion is a direct consequence of the definition of essential classes. 
(2) If i is inessential, then there is a state 7 withi=>j and j 47. Hence, j ¢ C(i). 


Assuming C(Z) is closed implies that p= =0 forall m2=1, k € C(i) and j ¢ Ci). 


Therefore, C(i) cannot be closed. (According to the definition of the relation i > /, 
there exists a positive integer m with ra >0.) = 


Let p MG) be the probability that the Markov chain, starting from state i, is in state 


set C after m time units: 
pf 
(C)= DieC oe , 


Furthermore, let C,, and C,, be the sets of all essential and inessential states of a 
Markov chain. The following theorem asserts that a Markov chain with finite state 
space, which starts from an inessential state, will leave the set of inessential states 
with probability 1 and never return (for a proof see e.g. Chung (1960)). This theorem 
justifies the notation essential and inessential states. However, depending on the 
transition probabilites, the Markov chain may in the initial phase return more or less 
frequently to the set of inessential states if it has started there. 


Theorem 8.2 Let the state space set Z be finite. Then, 
lim_p!”(Cy) =0. = 


m>o 
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8.2.3 Periodicity 
Let d; be the greatest common divisor of those indices m= 1 for which ps 2 
Then d; is said to be the period of state i. If 


p” =0 for all m>0, 


then the period of i is defined to be infinite. A state i is said to be aperiodic if d; =1. 


(m) 


Ifihas period d;, then p;;° > 0 holds if and only if m can be represented in the form 


=n-d;; n=1,2,.... Hence, returning to state i is only possible after such a num- 
of steps which is a multiple of d;. The following theorem shows that the period is 
a class property. 


Theorem 8.3 All states of a class have the same period. 


Proof Let i<j. Then there exist integers m and n with pe me 0 and on >0. If 


(7) 


the inequality p;; >0 holds for a positive integer r, then, from (8.5), 


(ntr+m) (n)_(r)_(m) 
Pij 2 Pj 1 Pij Pij > 0. 


2NL OO 


Since p;; 2p;; *p;; >9, this inequality also holds if r is replaced with 2 r: 


(n+2 r+m) 


He >0. 


Thus, d; divides the difference (n + 2r+m)—(n+r+m)=r. Since this holds for all r 


for which pe 


i 


>0, d; must divide d;. Changing the roles of i and 7 shows that d; 
also divides d;. Thus, d; = d;, which completes the proof. a 
Example 8.9 A Markov chain has state space Z = {0,1,...,63 and transition matrix 


1/3 2/33 0 0 0 0 
13 13 1/3 0 O 


oooco 


0 

1 0 0 0 0 0 

P= 0 13 #0 18 18 O 
0 0 0 0 1 0 90 
0 0 0 0 0 12 1/2 
0 0 0 0 12 0 1/2 


Clearly, {0, 1, 2} is a closed set of essential states. State 4 is absorbing, so {4} is an- 
other closed set. Having once arrived in a closed set of states the Markov chain can- 
not leave it anymore. {3, 5, 6} is a set of inessential states. When starting in one of 
its sets of inessential states, the Markov chain will at some stage leave this set and 
never retum. All states in {0, 1, 2} have period 1. Oo 
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Theorem 8.4 (Chung (1960)) The state space Z of an irreducible Markov chain with 
period d> 1 can be partitioned into disjoint subsets Z,,Z1,...,Z in such a way that 
from any state i e Z; a transition can only be made to a state j € Z;,1. (By agree- 
ment, j € Z, ifi € Zy.) | 


Example 8.10 Theorem 8.4 is illustrated by a discrete-time Markov chain with state 
space Z = {0,1,...,5} and transition matrix 


f 0 © 25 35 0 0 \ 
0 0 1 0 0 0 
p-| 9 © 0 0 12 12 
0 0 0 0 23 13 
212 0 0 0 O 
1/4 3/4 0 0 0 0 


This Markov chain has period d = 3. One-step transitions between the states are pos- 
sible in the order Z, = {0,1} > Zo = {2,3} > Z, = {4,5} > Z,. The three-step 
transition matrix P@) = P3 is 


at 


( 275 3/5 0 0 0 

38 58 0 0 0 

pay_| 9 0 31/40 9/40 0 
0. 0. 34 14° 0 0 

0 11/20 9/20 
0 21/40 19/40 


ooo 


oo 
oo 
oo 


8.2.4 Recurrence and Transience 


This section deals with the return of a Markov chain to an initial state. Such returns 
are controlled by the first-passage time probabilities 


AP = PXm =f; Xp tf k=1,2,...m—-1Xo=); if eZ. 


Thus, ae is the probability that the Markov chain, starting from state i, makes its 
(n) 
1 
Markov chain, starting from state i, is in state j after m steps, but it may have been in 
state j in between. For m= 1, 


first transition into state j after m steps. Recall that p is the probability that the 


ay @ 
fy aes 


The total probability rule yields a relationship between the m-step transition probabil- 
ities and the first-passage time probabilities 
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k —k 
py = =3 LR Py ) (in a 
where, by convention 
by = =1 forall j € Z. 


Thus, the first-passage time probability can be determined recursively from the fol- 
lowing formula 


ff =p as = ies (m- »). m=2,3,.... (8.25) 


The random variable Y;; with probability distribution { fay th =1,2;. .| is a _first- 
passage time. Its mean value is 

wip = EM) = Dnt fe”. 
The probability of ever making a transition into state j if the process starts in state i is 


fy = Dat the (8.26) 


In particular, f;; is the probability of ever returning to state 7. This motivates the in- 
troduction of the following concepts: 


| A state i is said to be recurrent if f;; = 1 and transient if f;; < 1. 


Clearly, if state i is transient, then 1;;= 00. But, if7 is recurrent, then p1;; = is also 
possible. Therefore, recurrent states are subdivided as follows: 


A recurrent state i is said to be positive recurrent if 1;; <0 and null recurrent 
if ;; = 00. An aperiodic and positive recurrent state is called ergodic. 


The random time points 
Tins N=1,2,..; 
at which the mth return into starting state i occurs, are renewal points within a Markov 
chain. By convention, T;,9 = 0. The time spans between neighboring renewal points 
Tin-Tin-13 n=1,2,... 
are called recurrence times. They are independent and identically distributed as Y;;. 


Therefore, the sequence of recurrence times constitutes an ordinary renewal process. 
Let 


Nit) = max(n; Tj, <t) and N;(%) = lim N; i(t) 


with corresponding mean values 
A,(t) =E(N,()) and H;() = lim H(t). 
—>00 
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Theorem 8.5 State i is recurrent if and only if 
(1) H,(%) =, or 
(2) Zim Pip =. 


Proof (1) If iis recurrent, then P(T;,=0)=0 for n=1,2,...- The limit N;(cc) is 
finite if and only if there is a finite n with 7;, = 0. Therefore, 

P(N(0) <0) < DEy PTin =) = 0. 
Thus, assumption f;; = 1 implies N;(«) = with probability 1 so that H;(«) = 0. 
On the other hand, if f;; <1, then the Markov chain will not return to state i with 


positive probability 1 — f;;. In this case N;(«) has a geometric distribution with mean 
value 


B(N;(c0)) = H,(o0) = “ Sas 


Both results together prove part (1) of the theorem. 


(2) Let the indicator variable for the random event that the Markov chain is in state i 
at time t= m be 


ini =| ; - ee m=1,2,.0. 
Then, 
NK) = Dnt ees 
Hence, 
Hy) = E{ E21 Imi) 
= Ln=t EU) 
= Dirat Plmi = 1) 
= De ipo: 

Now assertion (2) follows from (1). = 


By adding up both sides of (8.25) from m = 1 to co and changing the order of summa- 
tion according to formula (2.115) at page 99, theorem 8.5 implies the 


Corollary If state j is transient, then, for any i € Z, 


Drain, SP 


and, therefore, 
lim, p" =0. (8.27) 


m>o 
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Theorem 8.6 Let i be a recurrent state and i <j. Then state / is recurrent, too. 


Proof By definition of the equivalence relation "i<j", there are integers m and n 
with 


pi >0 and Dy, 50. 
By (8.5), 
n+r+m > (2) (nr) (mn) 
dj ji Pii Pij 
so that 
yall pom > Dy Py Det 2) Lig 
The assertion is now a consequence of theorem 8.5. a 


Corollary Recurrence and transience are class properties. Hence, an irreducible Mar- 
kov chain is either recurrent or transient. In particular, an irreducible Markov chain 
with finite state space is recurrent. 


It is easy to see that an inessential state is transient. Therefore, each recurrent state is 
essential. But not each essential state is recurrent. This assertion is proved by the fol- 
lowing example. 


Example 8.11 (unbounded random walk) Starting from x=0, a particle jumps a 
unit distance along the x-axis to the right with probability p or to the left with probab- 
ility 1 —p. The transitions occur independently of each other. Let X;, denote the loca- 
tion of the particle after the nth jump under the initial condition Xp =0. Then the 
Markov chain {Xo, X1,...} has period d= 2. Thus, 

pom) 0; m=0,1,.... 


To return to state x = 0 after 2m steps, the particle must jump m times to the left and 
m times to the right. There are en ) sample paths which satisfy this condition. Hence, 


2 
po = (2 )pm (dl —p)™; AD 


Letting y =p (1 —p) and making use of the well-known series 


= (2m 1 
> yt =, - 1/4 <y< 1/4, 
2 Cr) f1—4y 

yields 
> pew =—+ = pg ip, 


mo [a=2)? I-22 


Thus, > Poo <o forall p¥ 1/2. 


m= 
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Hence, by theorem 8.5, state 0 is transient. But for any p with 0 <p < | all states are 
essential, since there is always a positive probability of making a transition to any 
state irrespective of the starting position. By the corollary from theorem 8.6, the Mar- 
kov chain {X0,X1,...} is transient, since it is irreducible. 


If p = 1/2 (symmetric random walk), then 
1 


Sp = tim % 8.28 
nao? p12 |1—2p| ( ) 
Therefore, in this case all states are recurrent. O 


The symmetric random walk along a straight line can easily be generalized to n-dimen- 
sional Euclidian spaces: In the plane, the particle jumps one unit to the West, South, 
East, or North, respectively, each with probability 1/4. In the 3-dimensional Euclid- 
ian space, the particle jumps one unit to the West, South, East, North, up- or down- 
wards, respectively, each with probability 1/6. When analyzing these random walks 
analogously to the one-dimensional case, an interesting phenomenon becomes visible: 
the symmetric two-dimensional random walk (more exactly, the underlying Markov 
chain) is recurrent like the one-dimensional symmetric random walk, but all n-dimen- 
sional symmetric random walks with n > 2 are transient. Thus, there is a positive prob- 
ability that Jim, who randomly chooses one of the six possibilities in a 3-dimensional 
labyrinth, each with probability 1/6, will never return to his starting position. 


Example 8.12 A particle jumps from x =i to x=0 with probability p; or to i+ 1 
with probability 
1-p;, 0<p;<1,i=0,1,... 


The jumps are independent of each other. In terms of population dynamics, a popula- 
tion increases by one individual at each jump with positive probability 1 — p; if before 
the jump it comprised i individuals (state i). But at any state i a disaster can wipe out 
the whole population with probability p;. (State 0 is, however, not absorbing.) 


Let X;, be the position of the particle after the th jump. Then the transition matrix 
of the Markov chain {X0,X1,...} is 


ibe eee. 0 0 0 0 0 ) 
pO emp 10 0 0 0 
p-| p20 0  T=p>-6 0 0 
: ‘ : ; ‘ 0 0 
0 


Di 0 ee tak, ~ 20Y ak eps 


The Markov chain {Xo,X1,...} is irreducible and aperiodic. Hence, for finding the 
conditions under which this Markov chain is recurrent or transient it is sufficient to 
consider state 0, say. It is not difficult to determine fe : 
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Starting with 
1 
ti AD = =P0> 


the m-step first return probabilities are 
fon) = (ita Pi) Pm 1; m=2,3,... 


If pm) 1s replaced with (1-—(1 —p,_1)), then fe m) becomes 


fore (‘if »»| (ita ri): m= 2,3... 


so that 
a Ose (fia -p)), m= hee 
Thus, state 0 is recurrent if and only if 
m 
dim, HC —pj)=0. (8.29) 
Proposition Condition (8.29) is true if and only if 
Dino Pi =. (8.30) 
To prove this proposition, note that 
l-p;<e?'; i=0,],.... 
Hence, 
T1?o(1 -pi) <exp(-Z7o pi). 
Letting m — o proves that (8.29) follows from (8.30). 


The converse direction is proved indirectly: The assumption that (8.29) is true and 
(8.30) is wrong implies the existence of a positive integer k satisfying 


025 pd 
By induction 


Teel -pi) > 1-pe-Pint —---— Pm =1- Lice Bi. 
Therefore, 
jim, 121 -p) > im (1- E74 p,) > 0. 
This contradicts the assumption that condition (8.29) is true, and, hence, completes 
the proof of the proposition. 


Thus, state 0 and with it the Markov chain are recurrent if and only if condition (8.30) 
is true. This is the case, for instance, if p; =p >0; i=0,1,.... 
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8.3 LIMIT THEOREMS AND STATIONARY DISTRIBUTION 


Theorem 8.7 Let state i and j communicate, i.e. i<j. Then, 


fee Sp (8.31) 


Proof Analogously to the proof of theorem 8.5 it can be shown that, given the Mar- 
kov chain is at state i at time f=0, the sum 


is equal to the mean number of transitions into state 7 in the time interval (0,]. The 
theorem is, therefore, a direct consequence of the elementary renewal theorem (theo- 
rem 7.12, page 311). (If i #/, the corresponding renewal process is delayed.) = 


If the limit 


lim rate m) 


m>o 
exists, then it coincides with the limit at the right-hand side of equation (8.31). Since 
it can be shown that for an irreducible Markov chain these limits exist for all i,7 € Z, 
theorem 8.7 implies the 


Corollary Let ah ™ be the m-step transition probabilities of an irreducible, aperiodic 


Markov chain. Then, 
‘ (m) _ ad. 
im Pi ij 7 yy 


If state 7 is transient or null-recurrent, then 


If the irreducible Markov chain has ay d>1, then 
(m) _ de 
BP yy 


Ra 


To see this, switch from the one-step transition matrix P to the d-step transition 
matrix P®. A proof of the following theorem is e.g. given in Feller (1968). 


Theorem 8.8 For any irreducible, aperiodic Markov chain, there are two possibilities: 
(1) If the Markov chain is transient or null recurrent, then a stationary distribution 
does not exist. 

(2) If the Markov chain is positive recurrent, then there exists a unique stationary 
distribution {7;, 7 ¢ Z}, which for any i € Z is given by 


lim p®™ = 1 = 


a moe Hyj° 
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Example 8.13 A particle moves along the real axis. Starting from position (state) i 
it jumps to state i+ 1 with probability p and to state i— 1 with probability g = 1—p, 
i=1,2,.... When the particle arrives at state 0, it remains there for a further time unit 
with probability g or jumps to state 1 with probability p. Let X;, denote the position 
of the particle after the n th jump (time unit). Under which condition has the Markov 
chain {X0,X1,...} a stationary distribution? 

Since pog = 4; Piis1 =P, and p;j-1 =q=1-p; i=1,2,..., the system (8.9) is 

To =TMoGtn1 4g 
Tj =TjiPt+Miw1g; 1=1,2,.... 


By recursively solving this system of equations, 


i 
m= (8) To; i=0,1,.... 


To ensure that © 2;=1, condition p <q or, equivalently, p < 1/2, must hold. In 
this case, 


n= 47" (2)'; i=0,1,... (8.32) 


The necessary condition p < 1/2 for the existence of a stationary distribution is intui- 
tive, since otherwise the particle would tend to drift to infinity. But then no time-in- 
variant behavior of the Markov chain can be expected. O 


Theorem 8.9 Let {Xo0,X1,...} be an irreducible, recurrent Markov chain with state 
space Z and stationary state probabilities 7;, i €¢ Z. If g is any bounded function on 
Z, then 


. 1g : 
lim = > 2(X;) = > 1; 2(i). | 
mbaon 24 8C 7) 2, ig) 


For example, if c; = g(i) is the profit’ which arises when the Markov chain makes a 
transition to state i, then 

Diez Mili 
is the mean profit in the long-run resulting from a state change of the Markov chain. 


Thus, theorem 8.9 is the analog to the renewal reward theorem (formula (7.148) at 
page 325) for compound renewal processes. In particular, let 


1 for i=k 
0 for i#k’ 


If changes of state of the Markov chain occur after unit time intervals, then the limit 


ea 
dim, 7 Lj (Xj) 


gi) = 


is equal to the mean percentage of time the system is in state k. By theorem 8.9, this 
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percentage coincides with m,. This property of the stationary state distribution illus- 
trates once more that it refers to an equilibrium state of the Markov chain. A proof of 
theorem 8.9 under weaker assumptions can be found in Tijms (1994). 


Example 8.14 A system can be in one of the three states 1, 2, and 3: In state 1 it 
operates most efficiently. In state 2 it is still working but its efficiency is lower than 
in state 1. State 3 is the down state, the system is no longer operating and has to be 
maintained. State changes can only occur after a fixed time unit of length 1. Transi- 
tions into the same state are allowed. If X, denotes the state of the system at time n, 
then {X0,X1,...} is assumed to be a Markov chain with transition matrix 


0.8 0.1 0.1 
P= 0 0.6 0.4 
08 O 0.2 


Note that from state 3 the system most likely makes a transition to state 1, but it may 
also stay in state 3 for one or more time units (for example, if a maintenance action 
has not been successful). The corresponding stationary state probabilities satisfy the 
system of linear equations 

Tt, =0.8 71 +0.8 73 

T7 =0.10,;+0.672 


3 =0.10,+0.472+0.2 73. 


Only two of these equations are linearly independent. Together with the normalizing 
constraint 7; +72 +73 =1, the unique solution is 


4 1 


Mi= Gg, T2=13 =<. (8.33) 


The profits the system makes per unit time in states | and 2 are 
g(1) =$1000, g(2) =$ 600, 
wheras, when in state 3, the system causes a loss of 
g(3) = $100 


per unit time. According to theorem 8.9, after an infinite (sufficiently long) running 
time, the mean profit per unit time is 


4 1 1 

5 eles 100-— = 
Now, let Y be the random time, in which the system is in the profitable states 1 and 2. 
According to the structure of the transition matrix, such a time period must begin with 
state 1. Further, let Z be the random time in which the system is in the unprofitable 
state 3. The mean values E(Y) and E(Z) are to be determined. The random vector 
(Y,Z) characterizes the typical cycle of an alternating renewal process. Therefore, by 


D2 m;2(i) = 1000- 250 [$ per unit time]. 
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formula (7.144), page 322, the ratio 
EY) /[E(Y) + E(Z)] 
is equal to the mean percentage of time the system is in states 1 or 2. As pointed out 
after theorem 8.9, this percentage must be equal to 7, + 12: 
EY) 

E(Y) + E(Z) 
Since the mean time between transitions into state 3 is equal to E(Y)+ E(Z), the ratio 
1/[E(Y) + E(Z)] is equal to the rate of transitions to state 3. On the other hand, this 
rate is 71 P13 +1%2p23. Hence, 


=, +7. (8.34) 


1 
————— =71 +1 : 8.35 
EW) +E@ 7 71P13 + %2P23 (8.35) 


From (8.34) and (8.35) 
TM, +2 13 


E(Y) = —————__,, E(Z) = ——"—_-. 
”) M1 P13 +12 P73 4 11 P13 +12 P73 


Substituting the numerical values (8.33) gives E(Y) = 6.25 and E(Z) = 1.25. Hence, 
the percentage of time, the system is in the profit-generating states 1 and 2 is 


6.25/7.50 [100%] = 83,3 [%]. Oo 


Example 8.15 An insurer knows that the total annual claim size X of a client in a 
certain portfolio is exponentially distributed with mean value E(X) = $1000, i.e. 


F(x) = P(X <x) = 1—e7*/1000, y > 0, 


The insurer partitions his clients into classes 1, 2, and 3 depending on the annual 
amounts they claim, and the class they belong to: A client, who is in class | in the 
current year, will make a transition to class 1, 2 or 3 next year, when his respective 
total claims are between 0 and 600, 600 and 1200, or greater than 1200 in the current 
year. A client, who is in class 2 in the current year, will make a transition to class 1, 
2, or 3 next year if his respective total claim sizes are between 0 and 500, 500 and 
1100, or more than 1100. A client, who is in class 3 and claims between 0 and 1100 
or at least 1100 in the current year, will be in class 2 or in class 3 next year, respec- 
tively. In this case, a direct transition from class 3 to class 1 is not possible. When in 
class 1, 2, or 3, the clients will pay the respective premiums 600, 1200, or 1400 a 
year. The one-step transition probabilities p;; are 


P11 = F(600) = 0.4512, p49 =F(1200) — F(600) = 0.2476, 
p21 = F(500) = 0.3935, p29 = F(1100) — F(500) = 0.2736, 
p31 =0, p32 =F(1100) = 0.6671. 


Taking into account p;;+pj2+p;3 =1, i=1,2,3, the complete matrix of the one- 
step transition probabilities is 
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0.4512 0.2476 0.3012 
P=| 0.3935 0.2736 0.2736 
0.0000 0.6671 0.3329 
By (8.9), the stationary state probabilities satisfy the system of linear equations (note 


that one of the equations (8.9) is redundant, i.e., linearly dependent on the other two 
equations, and must be replaced by the normalizing equation (8.10)): 


m1 =0.4512 2; + 0.3935 29 
17 = 0.2476 11 + 0.2736 22 + 0.6671 13, 
l= Ty + T+ 13. 
The solution is 
mT, =0.2823, m2 =0.3938, m3 =0.3239. 
Hence, the average annual long-run premium a client has to pay is 
ae, 1; g(i) = 0.2823 - 600 + 0.3938 - 1200 + 0.3239 - 1400 = 1095.4 


so that the long-run average profit of the insurer per client and year is $95.4. O 


8.4 BIRTH AND DEATH PROCESSES 


8.4.1 Introduction 


In some of the examples considered so far only direct transitions to 'neighboring' 
states were possible. More exactly, if starting at state 7 and not staying there for one 
or more time units, only transitions to states i— 1 or i+ 1 could be made in one step. 
In these cases, the positive one-step transition probabilities have structure (Figure 8.2) 

Pil =Pi> Pi-l1=di> Pi=ri With pj+qit+rj;=1. (8.36) 
A discrete Markov chain with state space Z = {0, 1,...,z}, z <0, and transition prob- 
abilities (8.36) is called a birth and death process. The state space implies qo = 0. 
r; = 1-—p;-—q; 1s the probability that the process stays for another time unit at state 7. 
The term birth and death process results from the application of these processes to 
describing the development in time of biological populations. In this context, Xn is 
the number of individuals of a population at time n assuming that the population does 
not increase or decrease by more than one individual per unit time. Correspondingly, 
the p; and the q; are called birth and death probabilities, respectively. 


Po P\ Pn-\ Pn 


a \ 
ro=l a ee Tn-1 tno 
\ 
q\ q2 qn Gn+l 


Figure 8.2 Transition graph of a birth and death process with infinite state space 
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A birth and death process is called a pure birth process if all the q; are 0 (no deaths 
are possible), and a pure death process if all the p; are 0 (no births are possible). 


To make sure that a birth and death process is irreducible, the assumptions (8.36) 
have to be supplemented by 


p;>9 for i=0,1,... and g;>0 for i=1,2,.... (8.37) 


For instance, the random walk of example 8.13 is a birth- and death process with 


Pi=P, Gi=4 1; =9 fori=1,2,...5 po=p, go =9, ro=q=1-p. 


The unbounded random walk in example 8.11 also makes direct transitions only to 
neighboring states. But its state space is Z = {0,+1,+2,...} so that this random walk 
is not a birth and death process. 


8.4.2 General Random Walk with two Absorbing Barriers 


In generalizing example 8.3, a random walk with state space Z= {0,1,...,z} and 
transition probabilities (8.36) is considered, which satisfy the additional conditions 


ro=rz=1, p;>0 and g;>0 for i=1,2,...,z-1. (8.38) 
Thus, states 0 and z are absorbing (Figure 8.3). 


P\ pecan Pz\ 
QE - BIO 
TK 
q\ q2 qz-1 


Figure 8.3 Transition graph of a birth and death process with absorbing barriers 


Let a(n) be the probability that the random walk is absorbed by state 0 when starting 
from n; n=1,2,...,2—1. (Since z is absorbing as well, the process cannot have been 
in state z before arriving at state 0.) It is obvious that 


1=a(0)>a(1)>--->a(z-1)>a(z)=0. (8.39) 
From the total probability rule (1.24), 
a(n) =pn- a(n+1)+ qn: a(n—-1)+rn a(n), (8.40) 


or, equivalently, when replacing rn with rn = 1—pn- qn, 
a(n)-—a(n+ 1) = - [a(n-1)-a(n)]; n=1,2,...,2-1. 


Repeated application of these difference equations gives 


a(n) —a(n+ 1) =Apn [a(0)-—a(Q1)]; n=0,1,...,2-1, (8.41) 
= F199 Gn - a oe 
with An “Hi poepat 1,2,..,2-1; Ag=1, (8.42) 


and a(0) = land a(z) =0. 
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Summing equations (8.41) from n=k to n=z-—1 yields 
a(k) = Spek [a(n) — a(n-+ 1)] = [a(0)— a(1)] Eck An 
In particular, for k = 0, 
1 =[a(0)—a(1)] Epo An 
By combining the last two equations, 


z-l 
=A 
atk) = Zk An 2. k=0,1,..,z-1; a(z)=0, Ag=1. (8.43) 


zl 


n=0 An 
The probability of absorption at state z if the particle starts at k is b(k) = 1 — a(k). 


Gambler's ruin problem: The probabilities a(k) can be interpreted as follows (com- 
pare to example 8.3): Two gamblers begin a game with stakes of sizes k and z—k, 
respectively; k, z integers with 0<A<z. After each move a gambler either wins or 
loses $1 or the gambler's stake remains constant. These possibilities are controlled 
by transition probabilities satisfying (8.36) and (8.38). The game is finished if a gam- 
bler has won the entire stake of the other one or, equivalently, if one gambler has lost 
her/his entire stake. 


Mean time to absorption Let m(n) be the mean number of time units (steps) till the 
particle arrives at any of the absorbing states 0 or z, when it has started at location n, 
0<n<z. If the particle moves from the starting point 7 to the right, then the mean 
time till absorption is 1 + m(n +1); if the particle jumps to the left, then the mean time 
till absorption is 1+m(n— 1), and if the particle stays at position n a further time 
unit, then the mean time to absorption is 1+ m(n). Hence, analogously to (8.19), the 
m(n) satisfy the system of equations 


mn) = pa: [Lt+m(nt+1)]+¢n-TlL+ma-l]4+rn-(l+m(@)], (8.44) 


or, when replacing rn with rn = 1—pn—qn, the system of the equations (8.44) for 
the m(n) becomes a system of equations for the differences d(n) = m(n) — m(n-1): 


dint 1)= Fam) 7; HAVO ST: (8.45) 


The boundary conditions are m(0) = m(z) = 0 so that d(1) = m(1). 
k-fold application of the recursive equations (8.45) starting with n = 1 yields 


d(2) =F mc ae 


_92(G ay 1) 19192 ay ge 
a3) = 2( 2 m1) x) Po = pip2 ™)- pipa ~ By? 


9, 1 _ 919293 9293, 98 
MA) = 53 43 — bs = piprps ™)— pi pops ~ D2P3 ~ D3’ 
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and, finally, 


k=l g:giyyees 
= = Gigi "del 1. pe 
d(k) = Ag_ym(1) 2 Rape Dey k= Sy Ago, Zs (8.46) 


where the A;_) are given by (8.42) with n = k—1. The desired mean values m(n) are 
simply obtained by summation of the d(k) : 


m(n) = Xj Ak) = Dpei[m(k) — m(k- 1], 2 = 1,2,...52. (8.47) 
The still unknown m(1), which occurs as a factor in each of the d(k), can be 
determined from (8.47) by making use of the boundary condition m(z) = 0, i.e. from 
m(z) =0= DY p=1 d(n) 
The result is 
As didi de ) 5! 1 


m(l) == S&S P-iPi' Pk ai Pk ro 
1 Sy 
k=l 

k 0 1 2 3 4 5 6 
Pe 0 0.8 0.4 0.3 0.1 0 
dk 0 0.1 0.3 0.4 0.5 0.8 0 
rk 1 0.1 0.2 0.2 0.2 0.1 1 
Ax 1 0.1250 | 0.075 | 0.075 | 0.125 1.0 

a(k) 1 0.5833 | 0.5313 | 0.5000 | 0.4687 | 0.4167 0 
b(k) 0 0.4167 | 0.4687 | 0.5000 | 0.5313 | 0.5833 1 
m(k) 0 53.54 | 58.95 | 60.50 | 58.95 | 53.54 0 


Table 8.2 Numerical results for example 8.16 


Example 8.16 A random walk with state space Z = {0,1,2,...,6} and the absorbing 
barriers 0 and 6 is considered. Table 8.2 shows the birth and death probabilities py 
and gn, the corresponding ry, the ratios A; defined by (8.42), the absorption proba- 
bilities a(k) and b(A) with regard to locations 0 and 6, respectively, and the mean 
times to absorption m(A) at any of the locations 0 or 6. From (8.48), 
m(1) = 53.54. 

Now the mean times to absorption m(2), m(3),---,m(6) can be obtained from (8.47). 
For manual calculations, it is most efficient to determine the d(k) recursively by 
(8.45). In view of the symmetric structure of the birth and death probabilities, the 
absorption probabilities a(k) and b(6—k), k=0,1,2,3, coincide. oO 
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8.4.3 General Random Walk with One Absorbing Barrier 


The same situation as in section 8.4.2 is considered except that state z is no longer 
assumed to be absorbing (Figure 8.4), i.e. the corresponding transition probabilities 
have properties 


ro=l, pz=9, qz>0,rz=1-qz; p;>0 and g;>0 for i=1,2,...,z2-1. 


Pi was Pz1 
ZOLOSEEZOZO 
\ 
q1 q2 qz-1 qz 


Figure 8.4 Transition graph for a random walk with absorption at 0 


These transition probabilities imply that state 0 is absorbing, whereas from state z 
transitions to state z—1 are possible. The states 1,2,...,z—1 are transient so that after 
a random number of time units the particle will arrive at location 0 with probability 1. 
Again, jumps of the particle (possibly to the same location) always occur after one 
time unit. Since the boundary condition m(0)=0 is the same as in in the previous 
section, formulas (8.46) and (8.47) stay valid for k=1,2,..,z—1. Since pz =0, 
equation (8.44) yields for n =z the boundary condition 


m(z) = qz-[1+m@-1)]+ (1 —qz)-[1+m@)], 
or, equivalently, m(z)-m(z-1)= z ‘ (8.49) 


Letting nm =z—1 in (8.47) and combining the resulting equation with (8.49) leads to 
an equation for m(1), the solution of which is 


Z-l g:g:.1-°: 
Gi Witt’ 'Uz-1 1 1 
2 DeiPi Pat * Poa * Ge 


m(1)= On 


or, equivalently, 

= P\P2'Pi-1 
x Q1492°°°4i 
Now the m(2), m(3),...,m(z) can be recursively determined by (8.45) or (8.46), re- 
spectively, or directly by (8.47). After some algebra, a more elegant representation of 
m(k) is obtained by inserting (8.50) into (8.47) (Nisbet, Gurney (1982)): 


k=l Z 0: 
_ 91492°°'Gn P\P2"Pi-1 |). ,_ 
m(R) = m(1)+ 8: ( $182 2, Pa Pet): k=2,3,...,Z. 


m(1) = zt (8.50) 


Mean Time to Extinction m/(k) can be interpreted as the mean time to the extinc- 
tion of a finite population under the following assumptions: The maximal possible 
number of individuals the environment can sustain is z. If the population has k mem- 


8 DISCRETE-TIME MARKOV CHAINS 369 


bers, it will grow per time unit by one individual with probability p,, 1 <k<z—1, it 
will decrease per time unit by one individual with probability g;, 1<k<z, or the 
number of members does not change per time unit with probability r, = 1—p,-—q x. 
In addition, g9 =po =pz=9. No immigration occurs. One jump per time unit 
(possibly to the same state) is realistic if the time unit is chosen small enough. If this 
birth and death process arrives at the absorbing state 0, the population is extinct. 


Example 8.17 Consider a population with a maximal size of z= 6 individuals and 
transition probabilities with regard to a unit time given by Table 8.3. Then, by (8.50), 


m= ar * gras * danas * tga mae ~ 9S: 

Table 8.3 shows the mean times to extinction m(1), m(2), ...,m(6). Condition (8.49) is 
satisfied. O 
k 0 1 2 3 4 5 6 
Pk 0 0.8 0.5 0.4 0.2 0.1 0 
dk 0 0.1 0.2 0.4 0.5 0.6 0.8 
rk 1 0.1 0.3 0.2 0.3 0.3 2 
d(k) 155) | 18.125 | 5.250 | 2.750 | 1.875 | 1.250 

m(k) 0 155 | 173.125 }178.375 | 181.125 | 183.000 | 184.250 


Table 8.3 Numerical results for example 8.17 


Theorem 8.10 Under the additional assumptions (8.37) on its transition probabilities 
(8.36), a birth- and death process is recurrent if and only if 


S91 92 9n 
fel pep: (8.51) 


Proof It is sufficient to show that state 0 is recurrent. This can be established by using 
the result (8.43) referring to a general random walk with two absorbing barriers, since 


Jim p®)=fios k= 1,2,--, 


where the first passage time probabilities fj9 are given by (8.26). If state 0 is recur- 
rent, then, from the irreducibility of the Markov chain, fo9 = 1 and f;¢9 = 1. However, 
Jko = 1 if and only if (8.51) is valid. Conversely, let (8.51) be true. Then, by the total 
probability rule, 


foo =Po0 t+ Pois/io ="0t Po: l=1. = 
Discrete-time birth and death processes have significance on their own, but may also 


serve as approximations to the more important continuous-time birth and death pro- 
cesses, which are the subject of section 9.6. 
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8.5 DISCRETE-TIME BRANCHING PROCESSES 


8.5.1 Introduction 


Closely related to pure birth processes are branching processes. In this section, the 
simplest branching process, the Galton-Watson process, is considered. The terminol- 
ogy applied refers to population dynamics. The Galton-Watson process {Xo,X1,...} 
is characterized by the following properties (For illustration, see a tree-representation 
of a sample path of this process on condition Xo = | in Figure 8.5): 


1) The population starts with X9 individuals. They constitute the zeroth generation. 

2) Each individual i of the zeroth generation has Y;9 offspring; i= 0,1,2,.... The 

Y;,9 are independent and identically distributed as a random variable Y with 
Pe=P(Y=); k=0,1,..., Deo pe= 1; p= EY) and o” = Var(¥). (8.52) 


The set of all offspring of individuals of the zeroth generation constitutes the first ge- 
neration. The total number of all individuals in the first generation is denoted as X]: 


Xo 
X = Lies Yio- 
3) Generally, each member i of the (n— 1) th generation produces a random number 
Yin-1 Of offspring, and all Y;,_1 are independent and identically distributed as Y. In 
addition, the Y;,,_; are independent of all previous offspring figures 
Vigo wudy Yi03 n= 2, 3, sone 
The set of offspring generated by the (7 — 1) th generation constitutes the nth genera- 
tion with a total of X; individuals, n = 0,1,.... 
4) All individuals of a generation are of the same type. 


According to its construction, the random sequence {Xo, Xj,...} is a discrete-time 
Markov chain. Given Xo =i, its m-step transition probabilities (8.3) are equal to the 
absolute state probabilities po” = P(Xm =) of Xm: 


Dy = Pm =j|Xo=0- 


Xo=1 
X,=1 
Xy =3 
X3 =6 
X4=5 


Figure 8.5 Piece of a sample path of a Galton-Watson process 
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The first motive for dealing with branching processes was to determine the duration 
of (noble) families. The French statistician L. F. Benoiston de Chateauneuf (1776- 
1856) estimated their average duration to be 300 years (according to Moser (1839)). 
As pointed out by Heyde, Seneta (1972), I. J. Bienaymé (1796 —1878) was very 
likely able to determine the probability of the extinction of family names based on 
the extinction of male offspring, but, unfortunately, did not leave behind any written 
account. Sir F. Galton (1822-1911) and H.W. Watson (1822-1900) formulated the 
mathematical problem, but could not fully solve it; see Galton, Watson (1875). This 
was done by the Danish actuary JF. Steffenson only in 1930 (Steffenson (1930)). 
Other applications of branching processes are among else in mutant genes dynamics, 
nuclear chain reactions, electron multipliers to boost a current of electrons, and cell 
kinetics. There are numerous generalizations of the Galton — Watson process, e.g., 
multi-type branching processes, continuous-time branching processes, and age 
dependent branching processes. Recent monographs on theory and applications of 
branching processes are Haccou et al. (2011), Kimmel, Axelrod (2015), and Durret 
(2015). Pioneering classics are Harris (1963) and Sevastyanov (1971). 


8.5.2 Generating Function and Distribution Parameters 


In what follows, the assumption is made that the development of the population starts 
with one individual, i.e, X9 =1. The respective z-transforms (moment generating 
functions) of Yand X;, are denoted as (section 2.5, page 96) 


M@) = E@*)= Lieo pez", 
My (2) = Ez*") = Dey PXn =hz*; n=0,1,.... 
In particular, Mo(z) =z and M,(z) = Mz). (8.53) 
According to the notation introduced, 
Xn-1 
pC Se Ce 
where the random variables Yj »-1, Y2n-1,--» Yx,_,n-1 are independent and identi- 


cally distributed as Y. Hence, by formula (2.116), page 99, on condition X,,_; =m 
the z-transform of Xp is 


Mn(z|Xp-1 =m) = [M(z)]”; m=0,1,2,.... 
Now, by using this result and the formula of the total probability 
Mn(2) = Zio P(Xn =k) 2 
= Leo Ln-o PXn = k|Xq-1 = m) P(X,1 = m)z4 
= Zin=o PXn-1 =m) Dino 2° PXn = Xn =m) 
=Lin-0 P(Xp-1 =m) [M(z)]”. 
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The last row is the z-transform of X,_, with the variable z replaced by the variable 
M(z), i.e. the following recursive equation for the Mn(z) is valid: 


Mn(z)=M,_-\(M()), n=1,2,.... (8.54) 
A similar recursive equation for My(z) is 
M,(z)=M(M,_1(2)), n=1,2,...5 (8.55) 
which easily follows from (8.54) by induction: 
For n = 2 formula (8.55) is true since by (8.53) and (8.54), 
M (Zz) = M\(M(z)) = M(M (2). 
Now assume M,,_}(z) = M(M,,_2(z)) is true. Then, by (8.54), 
Mn(zZ) = My-1(M(2)) = M(M yo (M(2)) = MM 1 (2), 
which proves (8.55). 
The first and second derivative of My(z) given by (8.55) with regard to z are 
Mn(z) = M!(My1(2))-Mj,_12)s (8.56) 
Mn@)=M"(Mn1@)- (Mj, i@)+M(Mni@)-Mis@- (8.57) 
Now let z= 1. Then, since M,(1) = 1 for all =0,1,... and p = E(Y) = M’(1), formu- 
la (8.56) yields M/(1) = [- M’_, (1), or, equivalently, M)(1) = E(X;). Therefore, 
E(Xn) =vE(X,-1), 1=1,2,.... 
By repeated application of this relation, 
E(Xn) =u", n=1,2,.... (8.58) 


Thus, if u<1, Le. there is on average less than one offspring per individual, the 
population will eventually sooner or later become extinct, since in this case 


lim E(Xn) =0. 
From (8.57), 
Min (1)=M"(1)-[Mi_,())2+M/(1) Mi (1), 2 = 1,2,... 
or, taking into account (8.53) 
Mn (1) =M"(1)w2-D 4 MN (1), n= 1,2,.... 
Repeated application of this recursive equation for the Mi (1) gives 
Mn (1) = MY (1) [w2"? +p2"3 4-H, 
By (2.112), page 96, 
M"(1) =o7-y+p? and MY) = Var(Xn)—p" +p". 


After some algebra, Var(Xn) becomes 
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Var(Xn) = Mi/(1) Le un =o2 Liner + w2n-3 Ape bap wrt | 


=o pte y ey? | na 1,2,.0, 


By making use of the finite exponential series (2.18) (page 48), the final result is 


peat 
27-1 for w# 1, 
Var(Xn) = u-l 


no for p= 1. 


The variance of X, increases linearly with increasing n if u= 1. For u> 1, this vari- 
ance increases, and for 1 < 1 it decreases with increasing n. Clearly, this increase/de- 
crease occurs the faster the larger o”, which is the variance of the number of off- 
spring a member of the population has. 


8.5.3 Probability of Extinction and Examples 


A population can only become extinct if the probability po (an individual has no 
offspring) is positive. Hence, let us assume in this section that 


0<po<l. 


As in the previous section, let Xo = 1. Then the probability of extinction m9 is for- 
mally given as the limit of the m-step transition probabilities 


_4: (m)_ |. = = 
m9 = lim pio = jim PXm = 0[Xo = 1). 
By equations (2.9) (page 46) and (8.58), 
i=l i=l 


Thus, if u<1, then dim, a.” =0 so that jim P(Xn >1)=1-29=0. Hence, ifu<1, 


then m9 = 1. Moreover, it can be shown that m9 = 1 even if p= 1. Since 


pi =P(Xq = 0|Xy =0) = My (0), n=1,2,... 
equation (8.55) implies that 
mo = im, pY9 = im, Mn(0) = M(lim My-1(0)) = M(o), = 1,2,... 
Thus, the probability of extinction m9 satisfies the equation 
z=M(z). (8.59) 


This equation can have two solutions. In view of M(1) = 1, the integer z; = 1 is al- 
ways a solution. Hence, a possible second solution z7 must satisfy 0 <z2 <1. 
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Without proof: The desired probability of extinction to is the smallest solution of 
the equation (8.59). Such a solution exists if u = E(Y) = 1. 


Let T be the random time to extinction. Then T is the smallest integer n with property 
Xn _ 0, 1.€., 


Le min{n, Xn = 0}. 
The values of the distribution function F'7(n) of T at the ‘jump points' n are 
Fr(n) = P(T <n) = P(Xn = 0) = M7(0), n= 1,2,.... 

Furthermore, P(T <n) = P(T<n-1)+P(T=n) so that 

P(T=n)=M,(0)-M,_-1(0), n=1,2,.... (8.60) 
Given Jim, P(Xn = 0) = 1, by formula (2.9), page 46, the mean time to expiration is 

E(T) = Ln=ill — Mp1 (0)]- 

A sufficient condition for jim, P(X; =0)=1 isus<l. 
Example 8.18 A standard example for an application of the Galton-Watson process 
is due to Lotka (1931): Alfred Lotka investigated the random number Y of male 
offspring per male of the white population in the USA in 1920. (Some male offspring 


may arise out of wedlock so that Y need not refer to a married couple.) He found that 
Y has approximately a modified geometric distribution with z-transform 


_ 0.482 —0.041z 
LS ae ee Pc 


From this it follows that with probability pg = P(Y=0) = M(0) = 0.482 a male has 
no male offspring. The first and second derivatives of M(z) are 


0.2284 0.2554 -0.0714z 
(1-0.559z)?’ (1 -0.559z)? 


so that M’(1) = 1.1744 and M’(1) = 0.9461. Hence, by formulas (2.112), 
E(Y) = M’(1) = 1.1744, Var(Y) = 0.7413, and /Var(Y) = 0.8610. 


M'(z) = M"(z) = 


Thus, a male produces on average 1.1744 male offspring with a fairly high standard 
deviation of 0.8610. In this case, formula (8.59) is a quadratic equation: 


0.559z? — 1.0412+ 0.482 =0. 


z, =1 is surely a solution. The second solution is z2 = 0.86, which is the desired 
probability of extinction: m9 = 0.86. 


Lotka found that the geometric distribution as given by formula (2.27), page 50, did 
not fit well to his data set. Hence he estimated pp = 0.482 from his data and calculat- 
ed the pj, p2,... in such a way that their sum is 1 —pp = 0.518: 


p; = 0.518 -(1—0.559)-0,559"!; 7=1,2,.... 
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Generally, for any fixed pg = P(Y=0) with 0<po <1, the po-modified geometric 
distribution is given by the probability mass function 


pi=P(Y=i)=(1-po)p(-p)*); i= 1,2... (8.61) 
By formula (2.16), page 48, D2,(1—p)"! = ¥20(1 —p)! = 1/p so that indeed 
Li-0P;= 1. O 


Some individuals have the potential to produce a huge number of offspring (locusts, 
turtles, fish), even if only a few of them may reach adulthood (defined by the time 
when being capable of reproduction). In these cases a distribution allowing for 
infinite offspring is a suitable model. For human populations, a truncated distribution 
(page 71) can be expected to provide best results. For instance, consider the truncated 
Po-modified geometrical distribution with upper limit m, i.e., m is the maximal 
number of offspring an individual can produce. The probability po, for being directly 
estimated from the sample, is not subject to truncation. Given the probabilities (8.61), 
making use of the series (2.118), the truncated pg-modified geometric distribution 
{P0> P1>+-»Pm} is for any po with 0 < pg <1 defined by 


Pos Pim pager Pl-p) i=1,2,...,m. (8.62) 
Example 8.19 A female thrush produces up to 4 eggs a year from which adult birds 
arise. The random number Y of such eggs has the distribution p; = P(Y =i) with 
Po = 90.32, p, = 0.24, p2 = 0.28, p3 = 0,10, p4 = 0.06. 
The corresponding mean value is E(Y) = 1.34, and the z-transform is 
M(z) = 0.32 + 0.242 + 0.282? +0, 10z3 +.0.06z74. 


The probability of extinction of the whole offspring of the zeroth generation thrush 
in one of the subsequent generations is the smallest solution of the equation M(z) = z. 
This solution 19 = 0.579. 


Example 8.20 Let the random number of offspring Y have a mixed Poisson distribu- 
tion with continuous structure parameter L with density f(A). Then Y has the z-trans- 
form (see page 98) 
MG@) =JF X= Df aydn. 

The structure parameter L is supposed to have a Gamma distribution with density 
given by (2.74) (page 75): 

FH = Tht le? A>0,0>0, B>0. 
Then M(z) becomes 


M(z) = i eeDr (A) dr = ian Rs e-MB+1-2)4 o-1 7. 
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Substituting x = (B+ 1—z)A gives the final form of M(z): 
(me 
MOON pete) 


From formula (7.58) at page 281 we know that this is the z-transfom of a negative 


binomial distribution with parameters @ and f. Its first derivative is 
a 
wos oh. 
(B+ 1—z)%*! 


Hence, the mean number of offspring is E(Y) = M’(1) = a/B. A general solution, dif- 
ferent to 1, of equation M(z) =z has a complicated structure. Hence, only two special 
cases are considered. 


1) a=1: In this case the structure parameter L has an exponential distribution with 
parameter B. The equation M(z) =z becomes 


z*-(B+1)z+B=0, 
and the solutions arez; = 1 and z=. Hence, the probability of extinction will be 
mo=1 for B21 and mo=f for B<1. 

This result is in line with E(Y) = 1/B < 1 for B= 1. 
2) a=2, B = 1.2: In this case equation M(z) =z becomes 

23-4477 +4.847-1.44=0. 
The solutions are z; = 1 and z7 = 0.496. Hence, the probability of extinction is 

To = 0.496. O 


8.6 EXERCISES 


8.1) A Markov chain {X0,X1,...} has state space Z = {0, 1,2} and transition matrix 


05 0 0.5 
P=; 04 0.2 04 
0 04 0.6 


(1) Determine P(X> = 2|X, =0, Xp =1) and P(X2 =2, X; =0|Xo =1). 
(2) Determine P(X =2, X; =0|Xo =0) and, forn > 1, 
PUXn41 = 2, Xn = 0|Xy-1 = 0). 
(3) Assuming the initial distribution 
P(X = 0) = 0.4; P(Xo = 1) = P(X = 2) = 0.3, 
determine P(X, =2) and P(X, =1,X =2). 
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8.2) A Markov chain {Xo,X},...} has state space Z = {0, 1,2} and transition matrix 


0.2 0.3 0.5 
P=| 08 0.2 0 
06 O 04 


(1) Determine the matrix of the 2-step transition probabilities P@). 
(2) Given the initial distribution P(Xo =7) = 1/3; 1=0,1,2; determine the probab- 
ilities P(X, =0) and P(Xp = 0,X, = 1,X2 = 2). 


8.3) A Markov chain {Xo,X1,...} has state space Z = {0, 1,2} and transition matrix 


0 04 0.6 
P=| 08 O 02 
05 05 0 
(1) Given the initial distribution P(Xp = 0) = P(Xo = 1) = 0.4 and P(X = 2) = 0.2, 
determine P(X3 = 2). 


(2) Draw the corresponding transition graph. 


(3) Determine the stationary distribution. 


8.4) Let {Yo, Y1,...}. be a sequence of independent, identically distributed binary 
random variables with P(Y; = 0) = P(Y; = 1) = 1/2; i=0,1,.... Define a sequence of 


random variables {X,,X,...} by Xn= +n -Y,-1); n=1,2,... 


Check whether the random sequence {X1,X9,...} has the Markov property. 


8.5) A Markov chain {X09,X1,...} has state space Z = {0, 1,2, 3} and transition matrix 


f 01 02 04-03. 1} 
02 03 01 04 
04 0.1 03 02 
03 04 02 0.1 


(1) Draw the corresponding transition graph. 


P= 


(2) Determine the stationary distribution of this Markov chain. 


8.6) Let {Xo,X1,...} be an irreducible Markov chain with state space Z = {1, 2,...,n}, 
n<oo, and with the doubly stochastic transition matrix P = ((pj)), i.e., 
x pi =! forallie Zand Y p,;j=1 forall j € Z. 
je ieZ 


(1) Prove that the stationary distribution of {X,X1,...} is {tj = I/n, j € Z}. 
(2) Can {X9,X}1,...} be a transient Markov chain? 
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8.7) Prove formulas (8.20), page 346, for the mean times to absorption in a random 
walk with two absorbing barriers (example 8.3). 


8.8) Show that the vector n= (m1 =@, 12 =f, 13 =y), determined in example 8.6, is 
a stationary initial distribution with regard to a Markov chain which has the one-step 
transition matrix (8.22) (page 349). 


8.9) A source emits symbols 0 and | for transmission to a receiver. Random noises 
S1,52,... successively and independently affect the transmission process of a symbol 
in the following way: if a'0' (‘1’) is to be transmitted, then S; distorts it to a'l' ('0') 
with probability p (g); i=1,2,.... Let X9 =0 or X9 =1 denote whether the source 
has emitted a '0' or a 'l' for transmission. Further, let X;=0 or X;=1 denote 
whether the attack of noise S; implies the transmission of a'0' or a'1'; i= 1,2,....The 
random sequence {Xo,Xj1,...} is an irreducible Markov chain with state space 
Z = {0,1} and transition matrix 


q I1-@q 


(1) Verify: On condition 0 <p+q <1, the m-step transition matrix is given by 


pom) 1 q P )-C-29*( Pp -? 
Pt+d\ @ p Ptq -q 4q 


(2) Let p=q=0.1. The transmission of the symbols 0 and 1 is affected by the ran- 
dom noises $1, So, ..., S5. Determine the probability that a '0' emitted by the source 
is actually received. 


8.10) Weather is classified as (predominantly) sunny (S) and (predominantly) cloudy 
(C), where C includes rain. For the town of Musi, a fairly reliable prediction of 
tomorrow's weather can only be made on the basis of today's and yesterday's weather. 
Let (C,S) indicate that the weather yesterday was cloudy and today's weather is sunny 
and so on. Based on past observations it is known that, given the constellation (S,S) 
today, the weather tomorrow will be sunny with probability 0.8 and cloudy with prob- 
ability 0.2; given (S,C) today, the weather tomorrow will be sunny with probability 
0.4 and cloudy with probability 0.6; given (C,S) today, the weather tomorrow will be 
sunny with probability 0.6 and cloudy with probability 0.4; given (C,C) today, the 
weather tomorrow will be cloudy with probability 0.8 and sunny with probability 0.2. 


(1) Illustrate graphically the transition between the states 

1=(S,S), 2=(S,C), 3 =(C,S), and 4 =(C,C). 
(2) Determine the matrix of the transition probabilities of the corresponding discrete- 
time Markov chain and its stationary state distribution. 
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8.11) A supplier of toner cartridges of a certain brand checks her stock every 

Monday. If the stock is less than or equal to s cartridges, she orders an amount of 

S—s cartridges, which will be available the following Monday, 0 < s <S. The week- 

ly demands of cartridges D are independent and identically distributed according to 
pij=P(D=i); i=0,1,.... 

Let X, be the number of cartridges on stock on the n th Sunday (no business over 

weekends) given that the supplier starts her business on a Monday. 

(1) Is {X1, Xo,...}. a Markov chain? 


(2) If yes, determine its transition probabilities. 
8.12) A Markov chain has state space Z = {0,1,2,3,4} and transition matrix 


( 05 01 04 0 0 \ 
08 02 0 0 0 


(1) Determine the minimal closed sets. 
(2) Identify essential and inessential states. 
(3) What are the recurrent and transient states? 


8.13) A Markov chain has state space Z = {0, 1,2,3} and transition matrix 


fn” ay i 
1. G2 20..- O 
04 06 0 0 
0.1 04 02 03 


Determine the classes of essential and inessential states. 


8.14) A Markov chain has state space Z = {0, 1,2,3,4} and transition matrix 


( 9 02 08 0 0 ) 
0 0 0 09 O1 
P=| 0 0 0 01 09 
i “O° 20 
10 0 0 0 


(1) Draw the transition graph. 
(2) Verify that this Markov chain is irreducible with period 3. 
(3) Determine the stationary distribution. 
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8.15) A Markov chain has state space Z = {0,1,2,3,4} and transition matrix 


a 


P= 0.2 0.2 02 0.4 
0.2 08 0 0 
04 0.1 01 0 04 
(1) Find the essential and inessential states. 
(2) Find the recurrent and transient states. 


8.16) Determine the stationary distribution of the random walk considered in 
example 8.12 on condition p;=p, 0<p<l. 


8.17) The weekly power consumption of a town depends on the weekly average 
temperature in that town. The weekly average temperature, observed over a long time 
span in the month of August, has been partitioned in 4 classes (in C°): 


1=[10-15), 2=[15-—20), 3=[20- 25), 4=[25-30]. 


The weekly average temperature fluctuations between the classes in August follow a 
homogeneous Markov chain with transition matrix 


iC Sout. dies os vod ' 
02 03 03 02 
0.1 04 04 O.1 
0 02 05 03 


When the weekly average temperatures are in class 1, 2, 3 or 4, the respective aver- 
age power consumption per week is 1.5, 1.3, 1.2, and 1.3 [in MW]. (The increase 
from class 3 to class 4 is due to air conditioning.) 


What is the average power consumption in the longrun in August? 


8.18) A household insurer knows that the total annual claim size X of clients in a 
certain portfolio hasy a normal distribution with mean value $800 and standard 
deviation $260. The insurer partitions his clients into classes 1, 2, and 3 depending 
on the annual amounts they claim, and the class they belong to (all costs in $): 

A client, who is in class | in the current year, will make a transition to class 1, 2, or 3 
next year, when his respective total claims are between 0 and 600, 600 and 1000, or 
greater than 1000 in the current year. 

A client, who is in class 2 in the current year, will make a transition to class 1, 2, or 3 
next year if his respective total claim sizes are between 0 and 500, 500 and 900, or 
more than 900. 

A client, who is in class 3 and claims between 0 and 400, between 400 and 800, or at 
least 800 in the current year, will be in class 1, 2, or in class 3 next year, respectively. 
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When in class 1, 2, or 3, the clients will pay the respective premiums 500, 800, or 
1000 a year. 


(1) What is the average annual contribution of a client in the longrun? 
(2) Does the insurer make any profit under this policy in the longrun? 


8.19) Two gamblers 1 and 2 begin a game with stakes of sizes $3 and $4, respec- 
tively. After each move a gambler either wins or loses $1 or the gambler's stake 
remains constant. These possibilities are controlled by the transition probabilities 


Po =9, p, =9.5, p2 = 0.4, p3 =0.2, pg = 0.4, ps =0.5, pg = 0.6, p7 = 0, 
g7=9, q6 =9.5, g5 =0.4, g4 = 0.2, g3 =0.4, qo =0.5, gq, =0.6, qo = 9. 
(According to Figure 8.3 there is p; =p;j,; and g; =p;;-1.) The game is finished as 


soon as a gambler has won the entire stake of the other one or, equivalently, if one 
gambler has lost her/his entire stake. 


(1) Determine the probability that gambler | wins. 
(2) Determine the mean time till any of the gamblers win. 


8.20) Analogously to example 8.17 (page 369), consider a population with a 
maximal size of z=5 individuals, which comprises at the beginning of its obser- 
vation 3 individuals. Its birth and death probabilities with regard to a time unit are 


Po =9, p1 = 9.6, po = 0.4, p3 = 0.2, pg =0.4, ps = 0, 
go =9, q1 =0.4, 2 =0.4, q3 = 0.6, g4 = 0.5, g5 = 0.8. 


(1) What is the probability of extinction of this population? 


(2) Determine its mean time to extinction. 


8.21) Let the transition probabilities of a birth and death process be given by 
1 


Pi = —— and gi=1-pis Pa oe Po=l. 
14 [wG+ DP 


Show that the process is transient. 


8.22) Let i and 7 be two different states with f;; =/;; = 1. Show that both i and / are 


recurrent. 


8.23) The respective transition probabilities of two irreducible Markov chains | and 2 
with common state space Z = {0,1,...} are for all i=0,1,..., 
1 i+1 i+1 1 
ep ee bee pon ba kh D)\ ec Pe 
(1) Pii+t i4+2” Pio 142 and (2) pji+t 742’ Pid a) 
Check whether these Markov chains are transient, null recurrent, or positive 
recurrent. 
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8.24) Let N; be the random number of time periods a discrete-time Markov chain 
stays in state i (sojourn time of the Markov chain in state /). 
Determine E(N;) and Var(N;). 


8.25) A Galton-Watson process starts with one individual. The random number of 
offspring Y of this individual has the z-transform 


M(z) = (0.6z+0.4)?. 
(1) What type of probability distribution has Y (see section 2.5.1)? 
(2) Determine the probabilities P(Y = 4). 
(3) What is the corresponding probability of extinction? 


(4) Let T be the random time to extinction. Determine the probability P(T= 2) by 
applying formula (8.60). Verify this result by applying the total probability rule to 
P(T=2). 


8.26) A Galton-Watson process starts with one individual. The random number of 
offspring Y of this individual has the z-transform 


M(z) = el 5@-D. 
(1) What is the underlying probability distribution of Y? 
(2) Determine the corresponding probability of extinction. 


(3) Let T be the random time to extinction. Determine the probability P(7=3) by 
applying formula (8.60). 


8.27) (1) Determine the z-transform of the truncated, po - modified geometric 
distribution given by formula (8.62). 


(2) Determine the corresponding probability of extinction 19 if 
m= 6, po = 0.482, and p = 0.441. 


(3) Compare this tg with the probability of extinction obtained in example (8.18) 
without truncation, but under otherwise the same assumptions. 


8.28) Assume a Galton-Watson process starts with X9 = > 1 offspring. 


Determine the corresponding probability of extinction given that the same Galton- 
Watson process, when starting with one offspring, has probability of extinction m9. 


8.29) Given XQ = 1, show that the probability of extinction mo satisfies equation 
M(to) =o 

by applying the total probability rule (condition with regard to the number of 

offspring of the individual in the zerouth generation). Make use of the answer to 

exercise 8.28. 


CHAPTER 9 


Continuous-Time Markov Chains 


9.1 BASIC CONCEPTS AND EXAMPLES 


This chapter deals with Markov processes which have parameter set T = [0,00) and 
state space Z = {0,+1,+2,...} or subsets of it. According to the terminology intro- 
duced in section 6.3, for having a discrete parameter space, this class of Markov pro- 
cesses is called Markov chains. 


Definition 9.1 A stochastic process {X(f), 20} with parameter set T and discrete 
state space Z is called a continuous-time Markov chain or a Markov chain in contin- 
uous time if, for any n = 1 and arbitrary sequences 


{to,ti,--5tne1} with to <t) <--+<ty4y and {i9, 71, ...,ins1}, i, € Z, 
the following relationship holds: 
PUX(tn+1) = inet |X (tn) = in, ...,X(t1) = 11, X(to) = io) (9.1) 
= P(X(tn+1) = ins |X(tn) = in). : 
The intuitive interpretation of the Markov property (9.1) is the same as for dis- 
crete-time Markov chains: 


The future development of a continuous-time Markov chain depends only on 
its present state and not on its evolution in the past. 


The conditional probabilities 
Pijls,) = P(X =j|MS) =); s<t ijeZ; 


are the transition probabilities of the Markov chain. A Markov chain is said to be 
homogeneous if for all s,t € T and i,j € Z the transition probabilities p;;(s, 7) depend 
only on the difference t—s : 
Pij(s,d) = pij(0,t—s). 
In this case the transition probabilities depend only on one variable: 
Pij(D =pij(0, 0). 


Note This chapter only considers homogeneous Markov chains. Hence no confusion can arise 
if only Markov chains are referred to. 


The transition probabilities are comprised in the matrix of transition probabilities P 
(simply: transition matrix): 
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PO=((pi)); tse Z. 
Besides the trivial property p;;(¢) 2 0, transition probabilities are generally assumed 
to satisfy the conditions 
xX pif=1, 120, ie Z. (9.2) 
jeEeZ 


Comment It is theoretically possible that, for some i € Z, 
x pit) <1; t>0, ieZ. (9.3) 
jEZ 
In this case, unboundedly many transitions between the states may occur in any finite 
time interval [0, 4) with positive probability 
1- » Pij(t). 
JEZ 


This situation approximately applies to nuclear chain reactions and population explo- 
sions of certain species of insects (e.g., locusts). Henceforth it is assumed that 


lim p;;(=1. (9.4) 
t> +0 
By (9.2), this assumption is equivalent to 
pij (0) = Jim, pif =8;; ijeZ. (9.5) 
The Kronecker symbol 6;; is defined by formula (8.4), page 340. 
Analogously to (8.5), the Chapman-Kolmogorov equations are 
pitt) = X pips) (9.6) 
keZ 


for any t>0, t20, and i,j € Z. By making use of the total probability rule, the 
homogeneity, and the Markov property, formula (9.6) is proved as follows: 


P(X(t+7T) =, X(0) = i) 


ijt +t) = P(X(E+ 0) =j|X0) =i) = 


P(X(0) = i) 
_ o PUX(t+1) =), X() =k X) =i) 
eZ P(X(0) = i) 
_ y PUD =/XO =k XO) =) PX =k XO) =i) 
keZ P(X(0) =i) 
ys PAE +) = JK = 1 PMO =H XO) =) PX) =i) 
- keZ P(X(0) = i) 


= 2 P(X(«) = j|X(0) = k) P(X) = k| X(0) = 1) 


=2 Pik OPK). 
keZ 
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Absolute and Stationary Distributions Let p;(‘) = P(X() =i) be the probability 
that the Markov chain is in state 7 at time ¢. p;(d) is called the absolute state probabil- 
ity (of the Markov chain) at time ¢. Hence, {p,(¢), i € Z} is said to be the absolute 
(one-dimensional) probability distribution of the Markov chain at time ¢. In particul- 
ar, {p;(0); i € Z} is called an initial (probability) distribution of the Markov chain. 
By the total probability rule, given an initial distribution, the absolute probability 
distribution of the Markov chain at time ¢ is 


pt) = 2%, Pid) pi, jeZ. (9.7) 


For determining the multidimensional distribution of the Markov chain at time points 
to,tj,.tn with O<tg <t) <-++<t,<, only its absolute probability distribution 
at time ¢g and its transition probabilities need to be known. This can be proved by 
repeated application of the formula of the conditional probability (1.22) and by mak- 
ing use of homogeneity of the Markov chain: 
P(X(to) = 10, X(t1) =f 15-5 Mtn) = in) 

= Pig (to) Pigi (41 — C0) Pijin(t2 — 41) Piping (tn ~ tn-1)- (9.8) 

Definition 9.2 An initial distribution {2; =p,(0), i € Z} is said to be stationary if 


u,=pi(t) forall t=0 andieZ. (9.9) 

e 

Thus, if at time ¢=0 the initial state is determined by a stationary initial distribution, 

then the absolute state probabilities p(t) do not depend on ¢ and are equal to 7;. 

Consequently, the stationary initial probabilities x; are the absolute state probabil- 

ities p(t) for all j e Z and ¢2 0. Moreover, it follows from (9.8) that in this case all 
n-dimensional distributions of the Markov chain, namely 

{P(A(q) +A) =i), X02 +A) = 12, Mtn th) =in}, 7 eZ (9.10) 


do not depend on A, 1.e. if the process starts with a stationary initial distribution, then 
the Markov chain is strictly stationary. (This result once more verifies the more 
general statement of theorem 6.1, page 234.) Moreover, it is justified to call 
{t;,1 € Z} a stationary (probability) distribution of the Markov chain. 


Example 9.1 The homogeneous Poisson process {N(f), t= 0} with intensity A is a 
homogeneous Markov chain with state space Z = {0, 1,...} and transition probabilities 


PijD= oF ee Tey. 


The sample paths of the process {M(4), 20} are nondecreasing step-functions. Its 
trend function is linearly increasing: m(t) = E(N(t)) =At. Thus, a stationary initial 
distribution cannot exist. (But, by the corollary following definition 7.1 (page 259), 
the homogeneous Poisson process is a stationary point process.) oO 
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Example 9.2 At time ¢=0 exactly n systems start operating. Their lifetimes are inde- 
pendent, identically distributed exponential random variables with parameter 2. Let 
X(t) be the number of systems still operating at time ¢. Then {X(f), t= 0} is a Markov 
chain with state space Z = {0, 1,...,1}, transition probabilities 


Pi = (1) (eM) et, n2i2j20, 


and initial distribution P(X(0) =n) = 1. The structure of these transition probabilities 
is due to the memoryless property of the exponential distribution (see example 2.21, 
page 87). Of course, this Markov chain cannot be stationary. 


Example 9.3 Let Z= {0, 1) be the state space and 


ae Eaves ae 
P(t) = oe oy 
t+1l ft+l1 


the transition matrix of a stochastic process {X(), t= 0}. It is to check whether this 
process is a Markov chain. Assuming the initial distribution 


Po(0) = P(X(0) = 0) = 1 
and applying formula (9.7) yields the absolute probability of state 0 at time ¢=3: 
Po0(3) = Po) poo) = 1/4. 
On the other hand, applying (9.6) with ¢= 2 and t= 1 yields the (wrong) result 


Po(3) = Poo2) Poo) + Po1(2) P10) = 1/2. 


Therefore, Chapman-Kolmogorov's equations (9.6) are not valid so that {X(A), t= 0} 
cannot be a Markov chain. Oo 


Classification of States The classification concepts already introduced for discrete- 
time Markov chains can analogously be defined for continuous-time Markov chains. 
In what follows, some concepts are defined, but not discussed in detail. 


A state set C c Z is called closed if 
pi) =0 forallt>0,ieC andj ¢C. 


If, in particular, {i} is a closed set, then 7 is called an absorbing state. The state j is 
accessible from i if there exists a ¢ with pj(t) > 0. 


If i and j are accessible from each other, then they are said to communicate. Thus, 
equivalence classes, essential, and inessential states, as well as irreducible and reduc- 
ible Markov chains can be defined as in section 8.2 for discrete Markov chains. 
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State 7 is recurrent (transient) 1f 
Jo Pit) dt = 0 (ao dt < 0) . 


A recurrent state i is positive recurrent if the mean value of its recurrence time (time 
between two successive occurences of state 7) is finite. Since it can easily be shown 
that p;j(¢o) > 9 implies p;;(t)>0 for all t>7¢o, introducing the concept of a period 


analogously to section 8.2.3 makes no sense. 


9.2 TRANSITION PROBABILITIES AND RATES 
This section discusses some structural properties of continuous-time Markov chains, 
which are fundamental to mathematically modeling real systems. 


Theorem 9.1 On condition (9.4), the transition probabilities p;;(¢) are differentiable 
in [0, 00) for all i,7 € Z. 
Proof For any 4 > 0, the Chapman-Kolmogorov equations (9.6) yield 


pylt+h)— Dy = & Punt) PylO — Pi 


=-(l-pilh))piD+ ZX picA)py(d- 
keZ, k#i 


Thus, 
—( —piilh)) $-C — pith) pi Spi(t+h) —piO 
<x piclh)pR(O< X pilh) 
keZ keZ 
k#i k#i 
= 1-p;(A). 
Hence, 


pit +h) —pi(O| < 1-pilh). 


The uniform continuity of the transition probabilities and, therefore, their differentia- 
bility for all t= 0 is now a consequence of assumption (9.4). a 


Transition Rates The following limits play an important role in future derivations. 
For any i,j € Z, let 


_ 1—pih) 
=] 9.11 
aed ae a (9.11) 
_ Pit) |, 
agit . , L#i. (9.12) 
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These limits exist, since by (9.5), 
pii(0) =1 and p;(0) =0 fori 4/ 
so that, by theorem 9.1, 


d i(t) 

/ Pp 

(0) Ai - =-qi, (9.13) 
dp j(t) 

/ i 2 es Rags 

69) = Z a ee i#j. (9.14) 


For h > 0, relationships (9.13) and (9.14) are equivalent to 
Diilh) = 1—qih+o(h) (9.15) 
PifA=qijhtoh), i#/, (9.16) 
respectively. The parameters g; and q;; are the transition rates of the Markov chain. 
More exactly, g; 1s the unconditional transition rate of leaving state i for any other 
state, and qj is the conditional transition rate of making a transition from state i to 
state 7. According to (9.2), 


xX qy=ai, i€Z. (9.17) 
(i.J#t} 


Kolmogorov's Differential Equations In what follows, systems of differential equa- 
tions for the transition probabilities and the absolute state probabilities of a Markov 
chain are derived. For this purpose, the system of the Chapman-Kolmogorov equa- 
tions is written in the form 


pijt+h)= io Pik PRIM . 


It follows that 


(t+ h)-p;,(t iK(h —piilh 


h 


By (9.13) and (9.14), letting h > 0 yields Kolmogorov's backward equations for the 
transition probabilities: 


PO = Z dinPy— GPO, 120. (9.18) 
FI 
Analogously, starting with 
pij(t+h) = py Pik OPA) 
keZ 
yields Kolmogorov's forward equations for the transition probabilities: 


Pi,(0) = 2 Pik!) Gig -FjPiZD, t29. (9.19) 
# 
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Let {p;(0), i € Z} be any initial distribution. Multiplying Kolmogorov's forward 
equations (9.19) by p;(0) and summing with respect to 7 yields 


ZX pO) pi) = X pi(O)X pikaej— X pi) airs 
ieZ ieZ k#j ieZ 
= 2 qe & PO) Pi — 9; X piO) pi. 
k#j ieZ ieZ 


Thus, in view of (9.7), the absolute state probabilities satisfy the system of linear dif- 
ferential equations 


DIO =X aujPHO —ajpf(O, 120, je Z. (9.20) 
FI 
In future, the absolute state probabilities are assumed to satisfy 
x pit) =1. (9.21) 
ieZ 


This normalizing condition is always fulfilled if Z is finite. 


Note If the initial distribution has structure 
p(0)=1, pO) =0 for 7 #i, 
then the absolute state probabilities are equal to the transition probabilities 
PiOD= pis, j € Z. 


Transition Times and Transition Rates It is only possible to exactly model real 
systems by continuous-time Markov chains if the lengths of the time periods between 
changes of states are exponentially distributed, since in this case the 'memoryless 
property’ of the exponential distribution (example 2.21, page 87) implies the Markov 
property. If the times between transitions have known exponential distributions, then 
it is no problem to determine the transition rates. For instance, if the sojourn time of 
a Markov chain in state 0 has an exponential distribution with parameter 19, then, 
according to (9.11), the unconditional rate of leaving this state is given by 


1 — poolh) = lim 1—e7Aoh 


#0 550 h h>0 h 
. Agh + o(h) . o(h) 
= lim ————— =Ao + lim ——. 
30 h : ih90 h 


Hence, 
qo=ho- (9.22) 
Now, let the sojourn time of a Markov chain in state 0 have structure 
Yo = min (Y01, Yoo), 


where Yo; and Yo2 are independent exponential random variables with respective 
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parameters A; and A. If Yo, < Yoo , the Markov chain makes a transition to state 1 
and if Yo, > Yo2 to state 2. Thus, by (9.12), the conditional transition rate from state 
0 to state 1 is 


, Poi”) diy el — ehh) e~Aoh +o0(h) 


=‘ 
401 90 h h>0 h 
— lim Ajh(—-Agh) re o(h) 
h>0 h h>0 


= lim(A, —AyAgh) = Ay. 
h->0 


Hence, since the roles of Yo; and Yo2 can be interchanged, 
qo1=A1, 9o2=A2, Go=Ai+Az. (9.23) 
The results (9.22) and (9.23) will be generalized in section 9.4. 


Transition Graphs Establishing the Kolmogorov equations can be facilitated by 
transition graphs. These graphs are constructed analogously to the transition graphs 
for discrete-time Markov chains: The nodes of a transition graph represent the states 
of the Markov chain. A (directed) edge from node i to node j exists if and only if 
qij > 0. The edges are weighted by their corresponding transition rates. Thus, two 
sets of states (possibly empty ones) can be assigned to each node 7: first edges with 
initial node 7 and second edges with end node ij, that is, edges which leave node i and 
edges which end in node i. The unconditional transition rate g; equals the sum of the 
weights of all those edges leaving node i. If there is an edge ending in state i and no 
edge leaving state i, then 7 is an absorbing state. 


Example 9.4 (system with renewal) The lifetime Z of a system has an exponential 
distribution with parameter 4. After a failure the system is replaced by an equivalent 
new one. A replacement takes a random time Z, which is exponentially distributed 
with parameter p. All life- and replacement times are assumed to be independent. 
Thus, the operation of the system can be described by an alternating renewal process 
(section 7.3.6) with 'typical renewal cycle' (Z,Z). Consider the Markov chain 
{X(0), t= 0} defined by 

X= 1 if the system is operating 

0 ifthe system is being replaced ° 


Its state space is Z= {0,1}. The absolute state probability p;(t) = P(X(A = 1) of this 


Markov chain is the point availability of the system at time tf. 


In this simple example, only state changes from 0 to | and from | to 0 are possible. 
Hence, by (9.22), 


70 =401 =e and gq) =qin =A. 
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Figure 9.1 Transition graph of an alternating renewal process (example 9.4) 


The corresponding Kolmogorov differential equations (9.20) are 

Po) =—Hpo() +hP10) 

PO = +p 08) —APi(0. 
These two equations are linearly dependent. (The sums at the left hand-sides and the 
right-hand sides are equal to 0.) Replacing po(f) in the second equation by | —p (4) 


yields a first-order nonhomogeneous differential equation with constant coefficients 
for p,(t): 


piO+Q+ppil) =n. 


Given the initial condition p;(0) = 1, the solution is 


~~ Ho | dh ast gy 
PUD a ee ese 


The corresponding stationary availability is 
: LL 
=| j= >—. 
Be berg) At+u 


In example 7.19, page 322) the same results have been obtained by applying the Lap- 
lace transform. (There the notation LZ = Y is used.) oO 


Example 9.5 (two-unit redundant system, standby redundancy) A system consists 
of two identical units. The system is available if and only if at least one of its units is 
available. If both units are available, then one of them is in standby redundancy (cold 
redundancy), that is, in this state it does not age and cannot fail. After the failure of a 
unit, the other one (if available) is immediately switched from the redundancy state 
to the operating state and the replacement of the failed unit begins. The replaced unit 
becomes the standby unit if the other unit is still operating. Otherwise it immediately 
resumes its work. The lifetimes and replacement times of the units are independent 
random variables, identically distributed as Z and Z, respectively. Z and Z are assum- 
ed to be exponentially distributed with respective parameters A and p. Let Ls denote 
the system lifetime, i.e. the random time to a system failure. A system failure occurs 
when a unit fails whilst the other unit is being replaced. A Markov chain {X(A), t= 0} 
with state space Z= {0,1,2} is introduced in the following way: X(‘) =i if i units 
are unavailable at time ¢. Let Y; be the unconditional sojourn time of the system in 
state i and Y;; be the conditional sojourn time of the system in state 7 given that the 


system makes a transition from state i into state 7. From state 0, the system can only 
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Figure 9.2 Transition graph for example 9.5 a) 


make a transition to state 1. Hence, Yo=Yo, =L. According to (9.22), the 
corresponding transition rate is given by 
qo =901 =. 
If the system makes a transition from state | to state 2, then its conditional sojourn 
time in state 1 is Yj}y =Z, whereas in case of a transition to state 0, it stays a time 
Yio =Z in state 1. The unconditional sojourn time of the system in state | is 
Y, =min (ZZ, Z). 
Thus, by (9.23), the corresponding transition rates are 
qi2=A, qio=H, and gq) =A +H. 
When the system returns from state | to state 0, then it again spends time Z in state 0, 


since the operating unit is 'as good as new' in view of the memoryless property of 
the exponential distribution. 


a) Survival probability In this case, only the time to entering state 2 (system failure) 
is of interest. Hence, state 2 must be considered absorbing (Figure 9.2) so that 


920 = 921 = 9. 

The survival probability of the system has the structure 

Fs) =PLs >) =po)tpi(0. 
The corresponding system of differential equations (9.20) is 

Po) =—Apol) + up 10) 
Pi) =+Apol)- (A+ wp, (9.24) 
p(t) =+Api(0). 

This system of differential equations will be solved on condition that both units are 


available at time t = 0. Combining the first two differential equations in (9.24) yields 
a homogeneous differential equation of the second order with constant coefficients 


for po(d): 
pol) + 2A+ w)po +2? pol) = 0. 
The corresponding characteristic equation is 
x27 4+(2k+p)x+02 =0. 
Its solutions are 
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x12 = Cree %) + Ape p2/4 


Hence, since po(0) = 1, for t= 0, 


Sag eiah& i = 2 
Po=a sinh >t with c= J4Au+u-. 


Since p;(0) =0, the first differential equation in (9.24) yields a = 2A/c and 


2A Uy 
pid=e 2 (Esinh $1+cosh £7), t=0. 


Thus, the survival probability of the system is 


2A+u 


= at 2+ 
F;(t)=e 2 cosh £1 - Zi 


: Cc > 
sinh £1 , t20. 


(For a definition of the hyperbolic functions sinh and cosh, see page 265). The mean 
value of the system lifetime Ls is most easily obtained from formula (2.52), page 64: 


E(Ls)==+ oe (9.25) 


For the sake of comparison, in case of no replacement (u=0), the system lifetime 


Ls has an Erlang distribution with parameters 2 and 2: 


F()=(1+00e*!, E(Ls)=2/h. 


b) Availability If the replacement of failed units is continued after system failures, 
then the point availability 


A(t) = pol) +P) 

of the system is of particular interest. In this case, the transition rate g2, from state 2 
to state 1 is positive. However, q2, depends on the number r=1 or r=2 of me- 
chanics which are in charge of the replacement of failed units. Assuming that a me- 
chanic cannot replace two failed units at the same time, then (see Figure 9.3) 

921 =42 ="u. 
For r=2, the sojourn time of the system in state 2 is given by Y2 = min(Z], Z), 
where Z, and Z> are independent and identically as Z distributed. Analogously, the 
sojourn time in state 1 is given by Y; = min(Z, Z). 


Xr r 


O=_=) © 


LL ru 


Figure 9.3 Transition graph for example 9.5 b) 
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Hence, the transition rates gjqg and g j2 have the same values as under a). The 
corresponding system of differential equations (9.20) becomes, when replacing the 
last differential equation by the normalizing condition (9.21), 


po(t) =-Apolt) + P10), 
Pi =+rpo()-A+wWpi+rpprd), 
l= po()+ pi(O+ po(?). 


The solution is left as an exercise to the reader. O 


1 = 


Figure 9.4 Transition graph for example 9.6 a) 


2n 


m 


Example 9.6 (two-unit system, parallel redundancy) Now assume that both units of 
the system operate at the same time when they are available. All other assumptions 
and the notation of the previous example are retained. In particular, the system is 
available if and only if at least one unit is available. In view of the initial condition 
Po(0) = 1, the system spends 


Yo =min (L),L2) 


time units in state 0. Yo has an exponential distribution with parameter 22, and from 
state 0 only a transition to state 1 is possible. Therefore, Yo = Yo, and 


qo =901 =2h. 
When the system is in state 1, then it behaves as in example 9.5: 
Gio=- Giz=A, gi =AthH. 


a) Survival probability As in the previous example, state 2 has to be thought of as 
absorbing: g29 — q21 = 0 (Figure 9.4). Hence, from (9.20) and (9.21), 


polt) =-2rpo(t)+ wp), 
pi () =42r pol) -(A+ Ww pi), 
1= po()+ pi(O+ pol). 


Combining the first two differential equations yields a homogeneous differential 
equation of the second order with constant coefficients for po(t) : 


Pot) + BA +p) polt) + 2A*pol(t) =0. 
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The solution is 


(2), hi 
= 2 c,,u 1 
pol =e cosh 5) t+—,— sinh 5) (| ; 


where 


Furthermore, 


AR (28), . 
Pi()=-we 7 sinh 51. 


The survival probability of the system is 
Fs(t) = P(Ls >) =po) +pild. 


Hence, 

o cf 3A+h 

Fs(t) =e ( - MV cosh S14 SAtH sinh £1 , (20. (9.26) 
The mean system lifetime is 

3 
Ells) = 55+ 507° 
Cn O ma 
ll ru 


Figure 9.5 Transition graph for example 9.6 b) 


For the sake of comparison, in the case without replacement (uu = 0), 
Ff) — 9 pct _ p-2at eee 
Fé) =2e e . E(Ls) 2° 
b) Availability Ifr (r= 1 or r=2) mechanics replace failed units, then 
92=421 =F. 


The other transition rates are the same as those under a) (Figure 9.5 ). The absolute 
state probabilities satisfy the system differential equations 


polt) =-2Arpol)+ upild), 
p(t) =+2Apo()-(A+p) pi) +rp2(d), 
1= po()+ pi(t)+ p2(d). 


Solving this system of linear differential equations is left to the reader. O 
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9.3 STATIONARY STATE PROBABILITIES 


If {1;, 7 € Z} is a stationary distribution of the Markov chain {A(d), ¢2 0}, then this 
special absolute distribution must satisfy Kolmogorov's equations (9.20). Since the 
mj; are constant, all the left-hand sides of these equations are equal to 0. Therefore, 
the system of linear differential equations (9.20) simplifies to a system of linear al- 
gebraic equations in the unknowns 77;: 


O= 2 qejte-ajtj, J EZ. (9,27) 
keZ, k#¥j 


This system of equations is frequently written in the form 


qjmj= LX get, Jet. (9.28) 
keZ, k#j 


This form clearly illustrates that the stationary state probabilities refer to an equilib- 
rium state of the Markov chain: 


The mean intensity per unit time of leaving state j, which is q;%;, is equal to 
the mean intensity per unit time of arriving at state j. 


According to assumption (9.21), only those solutions {;,j ¢ Z} of (9.27), which 
satisfy the normalizing condition, are of interest: 


x wel. (9.29) 
JEeZ 


It is now assumed that the Markov chain is irreducible and positive recurrent. (Recall 
that an irreducible Markov chain with finite state space Z is always positive recur- 
rent.) Then it can be shown that a unique stationary distribution {7;, 7 ¢ Z} exists, 
which satisfies (9.27) and (9.29). Moreover, in this case the limits 
= li 4 
Pj Paes Pi (0) 
exist and are independent of i. Hence, for any initial distribution, there exist the 
limits of the absolute state probabilities lim p;(¢), and they are equal to p;: 
t>o0 


Pj= jim PO, je Z. (9.30) 
00 
Furthermore, for all j € Z, 
: Tex = 
lim pfo=0. 
Otherwise, p;(t) would unboundedly increase as t + %, contradictory to p(t) <1. 
Hence, when passing to the limit as t > 00 in (9.20) and (9.21), the limits (9.30) are 


seen to satisfy the system of equations (9.27) and (9.29). Since this system has a 
unique solution, the limits p; and the stationary probabilities 7; must coincide: 


Pj =T, JéZ. 
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For a detailed discussion of the relationship between the solvability of (9.27) and the 
existence of a stationary distribution; see Feller (1968). 


Continuation of Example 9.5 (two-unit system, standby redundancy) Since the 
sys- tem is available if at least one unit is available, its stationary availability is 


A= To +1. 
When substituting the transition rates from Figure 9.3 into (9.27) and (9.29), the 1; 
are seen to satisfy the following system of algebraic equations 
—-X To + UT, =0, 


+An9-(A+p) 0, +722 =0, 


To + Ty + TM=1. 
Caser=1 
2 x 2 
t™=——+—_., m =——+—_., ny =——_.,, 
(A+p)* Ap (A+p)*-Ap (A+p)°-Ap 
2 
+X 
A=ng +m =———. 
(A+p)°-Ap 
Case r= 2 
ou 2A 2 


7 = ee ss 6 = 5s 
(A+)? +p? (A+)? +p? (A+)? +p? 


2u24+2Ap 


A=T)+n) = ———} . 
(A+p)? +p? 


Continuation of Example 9.6 (two-unit system, parallel redundancy) Given the 
transition rates in Figure 9.5, the 7; are solutions of 


—2i 19+ [Ty =0, 
+2iX09-(A+p) m1 +run =0, 
To + Ty + TMe=l. 
Caser=1 
sie, cee «gh 
(A+)? +42 (A+)? +42 Oct)? a2 
w2+20u 


A=nj+n) =F 
ee uy? $a? 
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Case r=2 
2 2 
2X 
(A+ LL) (A+ LL) (A+ pW) 
2 
A=no+m =1-(74-) ‘ 


Figure 9.6 shows a) the mean lifetimes and b) the stationary availabilities of the 
two-unit system for r= 1 as functions of p = A/u. As anticipated, standby redundancy 
yields better results if switching a unit from a standby redundancy state to the operat- 
ing state is absolutely reliable. With parallel redundancy, this switching problem 
does not exist, since an available spare unit is also operating. 


A 
10. Pape 1, ee ee ee ey 
standb b), 
VE(Ls) A y 
| | 
if 
0.8F 
le parallel ! 
parallel 
0.64 
] 
Ll Ly L Ll Ly 
0 0.5 ‘Ses 0 0.5 oe 


Figure 9.6 Mean lifetime a) and stationary availability b) 


Example 9.7 A system has two different failure types: type | and type 2. After a 
type i-failure the system is said to be in failure state 7; i= 1,2. The time L; to a type 
i-failure is assumed to have an exponential distribution with parameter 4;, and the 
random variables L; and Ly are assumed to be independent. Thus, if at time t=0 a 
new system starts working, the time to its first failure is Yop = min(Z;,Z>). After a 
type 1-failure, the system is switched from failure state 1 into failure state 2. The 
time required for this is exponentially distributed with parameter v. After entering 
failure state 2, the renewal of the system begins. A renewed system immediately 
starts working. The renewal time is exponentially distributed with parameter p. This 
process continues to infinity. 


All life- and renewal times as well as switching times are assumed to be independent. 
This model is, for example, of importance in traffic safety engineering: When the red 
signal in a traffic light fails (type 1-failure), then the whole traffic light is switched 
off (type 2-failure). That is, a dangerous failure state is removed by inducing a 
blocking failure state. 
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Figure 9.7 Transition graph for example 9.7 


Consider the following system states 
0 system is operating 
1 type 1-failure state 
2 type 2-failure state 


If X(f) denotes the state of the system at time ¢, then {X(f), t= 0} is a homogeneous 
Markov chain with state space Z = {0, 1,2}. Its transition rates are (Figure 9.7) 


Go1=A1, Go2=A2, Go=ArtA2, G12=41=¥ 920 =42 =H. 
Hence, the stationary state probabilities satisfy the system of algebraic equations 
— (A, +42) 10 + HT? = 0, 
A410 —-VTy= 0, 
Tot+My+ m=. 


The solution is 


7 bv 
"0 (Ay +Ag)V+(L+V)B’ 
a ee | ns 
11 +An)vt (Ar +v)p? 
a (Ay +A2)v o 


~ OG Fi) VFOTe VEL 


9.4 SOJOURN TIMES IN PROCESS STATES 


So far the fact has been used that independent, exponentially distributed times between 
changes of system states allow for modeling system behaviour by homogeneous 
Markov chains. Conversely, it can be shown that for any i € Z the sojourn time of 
a homogeneous Markov chain {X(f), t = 0} in state i also has an exponential distribu- 
tion: By properties (9.8) and (9.15) of a homogeneous Markov chain, 


400 APPLIED PROBABILITY AND STOCHASTIC PROCESSES 
P(Y; > t|X(0) =i) = P(X(s) =i, 0< 8 $1|X(0) =i) 
z stim P(E 1) ae nee 
= Jin [pale | 


= tim | 1-4: £+0(4) J. 


P(Y; >t|X(0) =i) =e", 120, (9.31) 


X(0) =1) 


It follows that 


since e can be represented by the limit 
= ang 
e = lim, (1 + 1) : (9.32) 
Thus, Y; has an exponential distribution with parameter qj. 


Given X(0) =i, X(Y;) = X(Y; +0) is the state to which the Markov chain makes a 
transition on leaving state i. Let m(nf) be the greatest integer m satisfying the in- 
equality m/n < t or, equivalently, 


nt—1 <m(nt) < nt. 


By making use of the geometric series, the joint probability distribution of the random 
vector (Y;, X(Y;)), i#/, can be obtained as follows: 


P(X(Yi) =f, Yi > 1X) =i) 
= P(X(Y;) =j, X(s) =i for 0 <s<t|X(0) =i) 


=lim, E P(x(tt) =), Ye 2, i) x0) = il) 


n> m=m/(nt) 
= lim ny CH) =j, x4) =i for 1 <k<m|X(0) = il) 
= fp, 2, [evn tlm) [Panto |” 


n 


[evar], oF Ole 


Hence, by (9.32), 
P(X(Y;) =), ¥; > 11X(0) =i) = Zea, $272 iF ED. (9.33) 


Passing to the marginal distribution of Y; (i.e., summing the equations (9.33) with 
respect to j € Z) verifies (9.31). 


Two other important conclusions are: 


9 CONTINUOUS-TIME MARKOV CHAINS 401 


1) Letting ¢=0 in (9.33) yields the one-step transition probability from state i into 
state j: 
; Ae -4 
py = PMY; +0) =/|\XO) =) = Gr, eZ. (9.34) 
2) The state following state i is independent of Y; (and, of course, independent of the 
history of the Markov chain before arriving at state i). 


Knowledge of the transition probabilities p;; suggests to observe a continuous-time 
Markov chain {X(A), t 20} only at those discrete time points at which state changes 
take place. Let X; be the state of the Markov chain immediately after the nth change 
of state and Xp = X(0). Then {X0,Xj,...} is a discrete-time homogeneous Markov 
chain with transition probabilities given by (9.34) 


py=POn=jl\Xn1=0 ae iy ee (9.35) 
1 


In this sense, the discrete-time Markov chain {Xq,X),...} is embedded in the con- 
tinuous-time Markov chain {X(#, ¢20}. Embedded Markov chains can also be 
found in non-Markov processes. In these cases, they may facilitate the investigation 
of non-Markov processes. Actually, discrete-time Markov chains, which are embed- 
ded in arbitrary continuous-time stochastic processes, are frequently an efficient (if 
not the only) tool for analyzing these processes. Examples for the application of the 
method of embedded Markov chains to analyzing queueing systems are given in 
sections 9.7.3.2 and 9.7.3.3. Section 9.8 deals with semi-Markov chains, the frame- 
work of which is an embedded Markov chain. 


9.5 CONSTRUCTION OF MARKOV SYSTEMS 


In a Markov system, state changes are controlled by a Markov process. Markov sys- 
tems, in which the underlying Markov process is a homogeneous, continuous-time 
Markov chain with state space Z, are frequently special cases of the following basic 
model: The sojourn time of the system in state 7 is given by 


Y;=min (Yj, Yj2,.--5 Yin,)» 
where the Y;; are independent, exponential random variables with parameters 


A353 7=1,2,..5073 if € Z. 


ij> 
A transition from state i to state j is made if and only if Y; = Y;;. If X(¢) as usual 
denotes the state of the system at time ¢, then, by the memoryless property of the 
exponential distribution, {X(4), t => 0} is a homogeneous Markov chain with transition 
rates 


nj 


=Aiz, i= djs Nij- 


_ wn DE 
qi 0h 
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This representation of g; results from (9.12) and (9.17). It reflects the fact that Y; as 
the minimum of independent, exponentially distributed random variables Y;; also has 
an exponential distribution, the parameter of which is obtained by summing the para- 
meters of the Y;;. 


Example 9.8 (repairman problem) n machines with lifetimes ZL, Lo,...,Ly start 
operating at time f=0. The Z; are assumed to be independent, exponential random 
variables with parameter 4. Failed machines are repaired. A repaired machine is ‘as 
good as new'. There is one mechanic who can only handle one failed machine at a 
time. Thus, when there are k>1 failed machines, k—1 have to wait for repair. The 
repair times are assumed to be mutually independent and identically distributed as an 
exponential random variable Z with parameter p. Life- and repair times are independ- 
ent. Immediately after completion of its repair, a machine resumes its work. 


Let X(f) denote the number of machines which are in the failed state at time ¢. Then 
{X(), 20} is a Markov chain with state space Z = {0, 1,...,1}. The system stays in 
state 0 for a random time 


Yo =min (Ly, Lo, ee Ln), 
and then it makes a transition to state 1. The corresponding transition rate is 


40 = 401 = An. 
The system stays in state 1 for a random time 


Y = min (L1, Lo, tee ge 16Z): 


From state 1 it makes a transition to state 2 if Y; =L;, fork € {1,2,.....-1}, anda 
transition to state 0 if Y; = Z. Hence, 


gio =H, giz =(n—I)A, and gi =(n—-1)A+p. 
In general (Figure 9.8), 
qj-1,j = (n —jt 1)A; J= 1, 2,...,n, 
Gj+i,j =H jJ=9,1,...,2-1, 
q7= 04 li=j| 22, 


qj =(n-f)At+p; j=1,2,...,n, 


do =na. 
nr (n—1)A Xr 
Oe 
Mm Mm Ll 


Figure 9.8 Transition graph for the repairman problem (example 9.8) 
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The corresponding system of equations (9.28) is 


un, =nAto 
(n—j+ Whores + uma =(a-fr+ wns f= 1,201 
Un =ATy-1. 


Beginning with the first equation, the stationary state probabilities are obtained by 
successively solving for the 1;: 


! ; : 
R= Ge pp P05 JH OL, nM, 


where p = A/u. From the normalizing condition (9.29), 


“fi 
_| 4 _al i 
ro=| 8 ote’ 3 : 


Erlang's Phase Method Systems with Erlang distributed sojourn times in their 
states can be transformed into Markov systems by introducing dummy states. This is 
due to the fact that a random variable, which is Erlang distributed with parameters n 
and 1, can be represented as a sum of 1 independent exponential random variables 
with parameter p (formula (7.21), page 263). Hence, if the time interval, which the 
system stays in state i, is Erlang distributed with parameters n; and u;, then this 
interval is partitioned into n; disjoint subintervals (phases), the lengths of which are 
independent, identically distributed exponential random variables with parameter u;. 
By introducing the new states 7,, /2,..., jn, to label these phases, the original non- 
Markov system becomes a Markov system. In what follows, instead of presenting a 
general treatment of this approach, the application of Erlang’'s phase method is 
demonstrated by an example: 


Example 9.9 (qvo-unit system, parallel redundancy) As in example 9.6, a two-unit 
system with parallel redundancy is considered. The lifetimes of the units are identic- 
ally distributed as an exponential random variable Z with parameter 1. The replace- 
ment times of the units are identically distributed as Z, where Z has an Erlang distri- 
bution with parameters n =2 and up. There is only one mechanic in charge of the 
replacement of failed units. All other assumptions and model specifications are as in 
example 9.6. The following system states are introduced: 


0 both units are operating 
one unit is operating, the replacement of the other one is in phase 1 
one unit is operating, the replacement of the other one is in phase 2 


no unit is operating, the replacement of the one being maintained is in phase 1 


BR WwW NO 


no unit is operating, the replacement of the one being maintained is in phase 2 


The transition rates are (Figure 9.9): 
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Figure 9.9 Transition graph for example 9.9 


qo1 =2A, qo =2A 


Gi2=- 913 =A, Qi =AtH 


920 = Us 923 =A, qa =A+y 
934=U, 43 =H 
q41=H, W4=H. 
Hence the stationary state probabilities satisfy the following system of equations 
[m2 =2A ng 
2Ano+uUm4=(A+pM)2] 
wm) =(A+p) m2 
AN, +AT2 =—PT3 
H™3 =H714 
l=nop t+) +02+73+74. 
The stationary probabilities i = P(‘i units are failed’) are of particular interest: 
To =To, T, =i +N, Wy =MZH+N. 
With p = E(QV/E(L) = 2A/p the nv: are 


3 are alte 
ng =[1+2p+3p?+7p>| : 
Pe ae ie ale 
nt =|2p+ip? | To; x3 =| p?++p3| To: 
The stationary system availability is given by A = To + Ti: oO 


Unfortunately, applying Erlang's phase method to structurally complicated systems 
leads to rather complex Markov systems. 
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9.6 BIRTH AND DEATH PROCESSES 


In this section, continuous-time Markov chains with property that only transitions to 
‘neighboring’ states are possible, are discussed in more detail. These processes, called 
(continuous-time) birth and death processes, have proved to be an important tool for 
modeling queueing, reliability, and inventory systems. In the economical sciences, 
birth and death processes are among else used for describing the development of the 
number of enterprises in a particular area and of manpower fluctuations. In physics, 
flows of radioactive, cosmic, and other particles are modeled by birth and death 
processes. Their name, however, comes from applications in biology, where they 
have been used to stochastically model the development in time of the number of 
individuals in populations of organisms. 


9.6.1 Birth Processes 


A continuous-time Markov chain with state space Z = {0,1,...,1} is called a (pure) 
birth process if, for alli=0,1,...,2—1, only a transition from state i to i+ 1 is possi- 
ble. State n is absorbing if n <0. 


Thus, the positive transition rates of a birth process are given by qj; i,;. Henceforth 
they will be called birth rates and denoted as 
Xi = qi,i+1> l= 0, 1,...,N- 1, 
An =0 for n <0, 


The sample paths of birth processes are nondecreasing step functions with jump 
height 1. The homogeneous Poisson process with intensity 4 is the simplest example 
of a birth process. In this case, A; =A, 1=0,1,.... Given the initial distribution 


Pm(0) = P(X) =m) = 1 


(i.e., in the beginning the 'population' consists of m individuals), the absolute state 
probabilities p;(¢) are equal to the transition probabilities p,,;(t). In this case, the 


probabilities p;(t) are identically equal to 0 for j <m and, according to (9.20), for 
j 2m they satisfy the system of linear differential equations 


Pim(t) = —hm P(t), 
Pi(t) =+A;-1priO—-AjipO; jamtl1,m+2,... (9.36) 
Pr) =HhytPn i, n<o. 
The solution of the first differential equation is 
Pm(th=e mt, £20. (9.37) 


For j=m+1,m+2.,..., the differential equations (9.36) are equivalent to 
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eh! (Di(n +Ajp(0) = hye PO 

or 

Ato, 

1 eitp(t)) =e" DO. 
By integration, 

(P=; ip Aix, d. 9.38 

p(t) j-1€ 0 & Pj-1(x) x. (9.38) 

Formulas (9.37) and (9.38) allow the successive calculation of the probabilities p;(¢) 
for j=m+1,m+2,.... For instance, on conditions po(0) = 1 and Ap # Aq, 

P\()=Ao aa eh i ehox dy 


=o etl oe o-hi x dy 


= tt (ett e-tor), t>0. 


If all the birth rates are different from each other, then this result and (9.38) yield by 
induction: 


ae: dk 1 
C;=~— I », O<Si<j, Co=—. 
FOG 0, kei MEAG eas ae 


Linear Birth Process A birth process is called a linear birth process or a Yule-Furry 
process (see Furry (1937) and Yule (1924)) if its birth rates are given by 


Ag=ia; 1=0,1,2,.... 


Since state 0 is absorbing, an initial distribution should not concentrate probability 1 
on state 0. Linear birth processes occur, for instance, if in the interval [t, +h] each 
member of a population (bacterium, physical particle) independently of each other 
splits with probability A4 + o(h) ash > 0. 


Assuming p; = P(X(0) = 1) =1, the system of differential equations (9.36) becomes 
Pit) =A Lip) -G-VYpjrOls f= 1,2. (9.39) 
with 
Pi(O)=1, p0)=0; f=2,3,.... (9.40) 
The solution of (9.39) given the initial distribution (9.40) is 
p(j=e*(1—-e*) 1; (=1,2,.... 
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Thus, X(t) has a geometric distribution with parameter p = e~™. 


function of the linear birth process is 
m(t)=e*!, t>0. 


Hence, the trend 


If Z is finite, then there always exists a solution of (9.36) which satisfies 
x pi(t)=1. (9.41) 
ieZ 


In case of an infinite state space Z={0,1,...}, the following theorem gives a 
necessary and sufficient condition for the existence of a solution of (9.36) with prop- 
erty (9.41). Without loss of generality, the theorem is proved on condition (9.40). 


Theorem 9.2 (Feller-Lundberg) A solution {po(d), p1(0), ... } of the system of differ- 
ential equations (9.36) satisfies condition (9.41) if and only if the series 

ys 

Dis 9.42 
=0 Mi 2 


1 


diverges. 
Proof Let 
SK) =po +P +> +p. 
Summing the middle equation of (9.36) from j = 1 to & yields 
SD = Ape 
By integration, taking into account s;,(0) = 1, 
t 
1 sx) = Ax Jy Du) a. (9.43) 

Since s;(¢) is monotonically increasing as k > «, the following limit exists: 

r(t) = lim (1 -s;(d). 

k>o 

From (9.43), 

ef Pelx) de > (0). 


Dividing by 2, and summing the arising inequalities from 0 to k gives 

F ( 

are 1) 
> Ea ere 

J s x(x) dx > OZ cas 

Since s;(f) < 1 for all ¢= 0, 
(re 1) 
> ple 2 capt Ce a 
tO + Ss Ab rei ale 

If the series (9.42) diverges, then this inequality implies that (4) = 0 for all > 0. But 
this result is equivalent to (9.41). 
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Conversely, from (9.43), 
ha fp PEx) dx <I 
so that 


ee 


(aoeety : 
0 Ao AY Nk 


Passing to the limit as k > 0, 
t os 
Ja-r@yars Xt. 
0 i=0 “i 
If r(4) = 0, the left-hand side of this inequality is equal to ¢. Since ¢ can be arbitrarily 
large, the series (9.42) must diverge. This result completes the proof. a 


According to this theorem, it is theoretically possible that within a finite interval 
[0,¢] the population grows beyond all finite bounds. The probability of such an 
explosive growth is 


1-Dieo pil?) 


This probability is positive if the birth rates grow so fast that the series (9.42) 
converges. For example, an explosive growth would occur if 


Mz =iZA;3 i= 1,2,... 


since 


It is remarkable that an explosive growth occurs in an arbitrarily small time interval, 
since the convergence of the series (9.42) does not depend on ¢. 


9.6.2 Death Processes 


A continuous-time Markov chain with state space Z= {0,1,...} 1s called a (pure) 
death process if, for all i= 1,2,... only transitions from state 7 to i— 1 are possible. 
State 0 is absorbing. 


Thus, the positive transition rates of pure death processes are given by g;;-1, i2 1. 
In what follows, these transition rates will be called death rates and denoted as 
Ho =9, we=Gii-13 7=1,2,.... 
The sample paths of such processes are non-increasing step functions. For pure death 
processes, on condition 
Pn(0) = P(X(0) =n) = 1, 


9 CONTINUOUS-TIME MARKOV PROCESSES 409 


the system of differential equations (9.20) becomes 
Pn(t) =—Hn Pa) 
Py) =p) + Wp Pi; f=0,1,...0-1. (9.44) 
The solution of the first differential equation is 
Pii=e tl, 20. 
Integrating (9.44) yields 
P(t) = Hy ett en pa@yas: fsna leas, 0. (9.45) 
Starting with p,(2), the probabilities 
PiOe Fan—1 n= 2,549, 
can be recursively determined from (9.45). For instance, assuming Un # Uy] , 


PeA (t) =Un e Hn-1 fe eT (Hn —Un-)¥ dy 


= oy at o-bat 
~ Hn ~ Hn-1 (e oe ): 
More generally, if all the death rates are different from each other, then 
n 
pjt)= Xu Dypje*", O<j<n, (9.46) 
isj 
where 
1A _ be sans 1 
= — <i< =— 
> Dy gj Bem Ba? FOES" Dan ag 


k#i 


Linear Death Process A death process {X(f), t= 0} is called a linear death process 
if for a positive parameter p it has death rates 


H;=ip; 7=0,1,.... 
Given the initial distribution 
Pn(0) = P(X(0) =n) = 1, 
the process stays in state n an exponentially with parameter nu distributed time: 
Pi=e"#*, 20. 
Starting with p,(t), one obtains inductively from (9.45) or simply from (9.46): 


pi()= ("Jema —e tyr, 7=0,1,...,n. 


Hence, X(t) has a binomial distribution with parameters n and p =e" so that the 
trend function of a linear death process is 


m(t)=ne!, ¢>0. 
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Example 9.10 A system consisting of n subsystems starts operating at time ¢=0. 
The lifetimes of the subsystems are independent, exponentially with parameter A 
distributed random variables. If X(#) denotes the number of subsystems still working 
at time ¢, then {X(0), ¢= 0} is a linear death process with death rates 


u,=iA; 1=0,1,.... O 


9.6.3. Birth and Death Processes 


9.6.3.1 Time-Dependent State Probabilities 
A continuous-time Markov chain {X(f),¢2 0} with state space 


Z= {0,1,...,n}, n <0, 


is called a birth and death process if from any state i only a transition to state i— 1 or 
to state i+ 1 is possible, provided that i- 1 € Z andi+1 € Z, respectively. 


Therefore, the transition rates of a birth- and death process have property 
qi =9 for li-j| >1. 


The transition rates 4; = i441 and uj =q;j;-1 are called birth rates and death rates, 
respectively. According to the restrictions given by the state space, A, =0 for n <0 
and 9 = 0 (Figure 9.10). Hence, a birth process (death process) is a birth and death 
process, the death rates (birth rates) of which are equal to 0. If a birth and death 
process describes the number of individuals in a population of organisms, then, when 
arriving at state 0, the population is extinguished. Thus, without the possibility of 
immigration, state 0 is absorbing (Ag = 0). 


ro A i-1 hi 
© maa es Om 
Hi pa “Hi Ta 


Figure 9.10 Transition graph of the birth- and death process 


According to (9.20), the absolute state probabilities p(t) = P(X() =/), j ¢ Z, of a 


birth- and death process satisfy the system of linear differential equations 
Po) =~hopol) + Hi Pi, 
PY) = +1 PpIO-OAj+ up + wi PO, J=1,2,., (9.47) 


Pit) =+an1Pni—Hnpal), n<. 
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In the following two examples, the state probabilities { po(t), p1(0, ...} of two im- 
portant birth and death processes are determined via their respective z -transforms 


M(t,z) = Lio pi()z! 
under initial conditions of type 
Pna(O) = P(X(0) =n) = 1. 
In terms of the z-transform, this condition is equivalent to 
M(0,z) =z”, n=0,1,.... 
Furthermore, partial derivatives of the z-transforms will be needed: 


OM(t, z) = > plz! and 


OM(t,z) _ & 
ot i=0 Oz + 


D ipt)z!. (9.48) 
i=1 


Partial differential equations for M(t,z) will be established and solved by applying 
the characteristic method. 


Example 9.11 (dinear birth and death process) {X(t), t= 0} is called a linear birth 
and death process if it has transition rates 


A; =i, wp=ip, i=0,1,... 
In what follows, this process is analyzed on condition that 
P1(0) = P(X(0) = 1) = 1. 
Assuming pq(0) = 1 would make no sense since state 0 is absorbing. The system of 


differential equations (9.20) becomes 


pot) =npild), 
Pi(t) =(7-lapAO-iAtwWpPOt+Gt+ DupaiO; s=1,2,... (9.49) 


Multiplying the j-th differential equation by z/ and summing from j = 0 to j = 0, tak- 
ing into account (9.48), yields the following linear, homogeneous partial differential 
equation for M(z, z): 


OM(t,z) OM(t,z) _ 
a 27 DOZ- Wa = 0. (9.50) 


The corresponding (ordinary) characteristic differential equation is a Riccati differ- 
ential equation with constant coefficients: 


& - (2—1)(Az—p) =Az? +(A+p)z— UL (9.51) 


a) 4#p By separation of variables, (9.51) can be written in the form 


a oer 
E=0a=p 0" 
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Integration on both sides of this equation yields 


1 (4 
rere -1 


)= t+C. 


The general solution z=z(‘) of the characteristic differential equation in implicit 
form is, therefore, given by 


Azo 4) 
z-1/7’ 


c= (2—p) 1 In{ 


where c is an arbitrary constant. Thus, the general solution M(t,z) of (9.50) has the 
structure 


M¢.2)=/(@—we-in(AZ=#)), 


z-l 


where f can be any function with a continuous derivative. f can be determined by 
making use of the initial condition p;(0) = 1 or, equivalently, M(0,z) =z. Since 


f must have structure 


het 
IOC) = va: 


Thus, M(¢,z) is 


exp {(a wet in(*) 


exp {(a—wr-n(E#) at 


zl 


M(t, z) = 


After simplification, M(t,z) becomes 


pf 1—eO] [ape] 2 


M(t, Zz) = [u = neOWr | 2 [1 _ peOH"] 2" 


This representation of M(t,z) allows its expansion as a power series in z. The coef- 
ficient of z/ is the desired absolute state probability p,(t). Letting p = A/p yields 


1 —e@-wrt 
te peA-we , 
[1 = oO 
[1 -p Qn yie 


Po= 
pj()=(1-p)pt! OW 7 = 1,2... 


Since state 0 is absorbing, po(é) is the probability that the population is extinguish- 
ed at time ¢. Moreover, 
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: 1 forar< 
lim po) =} - 
xr for A> 


Thus, for A> the population will survive to infinity with positive probability p/A. 
If <u, the population will eventually disappear with probability 1. In the latter 
case, the distribution function of the lifetime L of the population is 


| — ew 
< = SS ———— 
PLS) = pol) = ay 


t20. 
Hence, the population will survive the interval [0, ¢] with probability 
P(L>th=1-po(d. 
From this, applying formual (2.52), page 64, 
eyes _A 
E(L) = =o in(2 i). 
The trend function m(t) = E(X(#) is principally given by 
m(t) = Dj ip; (t). 
By formulas (2.112), page 96, m(t) can also be obtained from the z-transform: 
_ OM(t,z) 
~  &z 


m(t) 


2=\, 


If only the trend function of the process is of interest, then here as in many other ca- 
ses knowledge of the z-transform or the absolute state distribution is not necessary, 
since m(t) can be determined from the respective system of differential equations 
(9.47). In this example, multiplying the j-th differential equation of (9.49) by j and 
summing from j = 0 to 0 yields the following first-order differential equation: 


m’(t) =(A-p)m(0). (9.52) 
Taking into account the initial condition p;(0) = 1, its solution is 
m(t) = eA-We 
By multiplying the j-th differential equation of (9.47) by j* and summing from / = 0 
to 00, a second order differential equation for Var(X(2)) is obtained. Its solution is 
Var(X(t)) = oat [1-e OW] 20, 


Of course, since M(t,z) is known, Var(X(#)) can be obtained from (2.112), too. 


If the linear birth- and death process starts in states s = 2, 3,..., no principal addition- 
al problems arise up to the determination of M(¢,z). But it will be more complicated 
to expand M(tz) as a power series in z. 
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The corresponding trend function, however, is easily obtained as solution of (9.52) 
with the initial condition ps(0) = P(X(0) =s) = 1: 


m(t)=se%W, ¢2>0. 


b) A= In this case, the characteristic differential equation (9.51) simplifies to 


_ dz =-—dt 

r(z—1)? 
Integration yields 

1 
=At-—— 
c > 
where c is an arbitrary constant. Therefore, M(t,z) has structure 
Mz)=f\~a (2 t— a) 


where f is a continuously differentiable function. Since p;(0)=1, f satisfies 


s(--4) =z. 


Hence, the desired function f is given by 


f@=1-4, x¥0. 


The corresponding z-transform is 


_At+C-Adz 
eee rs ere 
Expanding M(t,z) as a power series in z yields the absolute state probabilities: 
Cole 


Pod= j=1,2,.., £20. 


weve PIO Ga 
An equivalent form of the absolute state probabilities is 
Nt 2 Rl. 
PoD=— 5, PAO=[1-poOl PoOP”; f=1,2,.., £20. 
1+At 
Mean value and variance of X(¢) are 


E(X()) =1, Var(X(t)) =22¢. 


This example shows that the analysis of apparently simple birth- and death processes 
requires some effort. oO 


Example 9.12 Consider a birth- and death process with transition rates 
A V=A, LW; =1p; i=0,1,... 


and initial distribution and po(0) = P(X(0) = 0) = 1. 
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The corresponding system of linear differential equations (9.47) is 
Po) =HP1O Apo, 
PY) =ApjsO-A+ HAP (O+G+ DepHOs f= 1,2, (9.53) 


Multiplying the j-th equation by z/ and summing from j= 0 to 0 yields a homoge- 
neous linear partial differential equation for the moment generating function: 


C2) se —1) 


The corresponding system of characteristic differential equations is 
dz dM( 
- 1 
qe), —, 
After separation of variables and subsequent integration, the first differential equa- 
tion yields 


ane DN Ne SMG (9.54) 


; 2) 25/9 =A Nea): 


cj =In(z-1)-ut 
with an arbitrary constant c;. By combining both differential equations and letting 


p=A/p, 
dM(t,z) _ 
M(t,z) 


Integration yields 
c2 =InM(t,z)- pz, 
where cz is an arbitrary constant. As a solution of (9.54), M(t,z) satisfies 
c2 =f(¢1) 
with an arbitrary continuous function f, i.e. M(t,z) satisfies 


InM(t,z)-pz=f(n@-1)-pd. 


Therefore, 
Mz) = exp {f(In(z- 1)-ud+pz}. 
Since condition po(0) = 1 is equivalent to M(0,z) =1, f is implicitly given by 
f (dng -1))=-pz. 
Hence, the explicit representation of f is 
f(x) =-p(er+ 1). 
Thus, 


M(t,z) = exp (=p (eine-D-nt ~ 1) ~ pz}. 
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Equivalently, 
M(t,z) = e P(e), otp(l-e*)z - 


Now it is easy to expand M(t, z) as a power series in z. The coefficients of z/ are 


[pd -e HV 
7 


This is a Poisson distribution with intensity function p(1—e7#‘). Therefore, this 
birth and death process has trend function 


m(t)=p(l-e#*). 


For t > © the absolute state probabilities p;(¢) converge to the stationary state prob- 


p(t) = e P-e*). F201)... (9.55) 


abilities: 
; eS Sass 
Oe ; j=0,1,.... 


If the process starts at a state s >0, the absolute state probability distribution is not 
Poisson. In this case this distribution has a rather complicated structure, which will 
not be presented here. Instead, the system of linear differential equations (9.53) can 
be used to establish ordinary differential equations for the trend function m(f) and the 
variance of X(f). Given the initial distribution ps(0)=1, s=1,2,..., their respective 
solutions are 
m(t)=p(l-e#)+se3!, 
Var (X(t) = (1-e*\(p +e). 


The birth and death process considered in this example is of importance in queueing 
theory (section 9.7). O 


Example 9.13 (birth and death process with immigration) For positive parameters 
A, u, and v, let transition rates be given by 


A; =iA+V, My = Tu; i=0,1.,... 


If this model is used to describe the development in time of a population, then each 
individual will produce a new individual in [t,¢+ Af] with probability 7 At+ o(Af) or 
leave the population in this interval with probability u At+o(At). Moreover, due to 
immigration from outside, the population will increase by one individual in [t,t + A¢] 
with probability v¢+o(At). Thus, if X(#) =i, the probability that the population will 
increase or decrease by one individual in the interval [¢, t+ Af] is 


(iA +v)At+o(Ad) or in At+o(Ad), 


respectively. These probabilities do not depend on ¢ and refer to At 0. As in the 
previous example, state 0 is not absorbing. 
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The differential equations (9.47) become 
/ 
PoD =HP1M—vpold), 
/ os . . . 
PJO=AG-D+v)pjiOt+uG+ pj O-Aj+vt ys )piO- 


Analogously to the previous examples, the z-transformation M(t,z) of the probability 
distribution { po(4), p1 (0), ... } is seen to satisfy the partial differential equation 
AOD) = 2 —py(e— 1) AE + ve— 1) mee,2). (9.56) 


The system of the characteristic differential equations belonging to (9.56) is 
© -(Az-we-1), 
dM(t, z) 
dt 


From this, with the initial condition p (0) = 1 or, equivalently, M(0,z) = 1, the solu- 
tion is obtained analogously to the previous example 


on vw/n 
M(t, z) = eveeanare for NFL, 


= v(z-1)M(t,z). 


apn Le for 2 
M(t,z) =(1 +44) {1- AZ| noe ane 


Generally it is not possible to expand M(t,z) as a power series in z. But the absolute 
state probabilities p;(¢) can be obtained by differentiation of M(¢,z): 


i 
p= OMe 2) for i= 1,2,... 
oz z=0 
The trend function 
_ _ OM(t,z) 
m(t) = EX) = 
of this birth and death process is 
m(t) = = [eA'-1] for A¥p, (9.57) 


m(t)=vt for A=h. 


If <p, the limit as t > oo of the z-transform exists: 


lim (2) 2 (1 _ 2 ie _ 2.) pele 


For A <u, the trend function (9.57) tends to a positive limit as t > 00: 


for A<u. O 


lim m(t) = 
too 


v 
u-A 
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9.6.3.2 Stationary State Probabilities 


By (9.27), in case of their existence the stationary distribution {79,71,...} of a birth 
and death process satisfies the following system of linear algebraic equations 


AoTo — HiT =0 
Apt — Apt bpt t+ Bit =0, f=1,2,... (9.58) 
An-1hn-1 —Unln =0, n<o. 
This system is equivalent to the following one: 
1%) =AgTo 
Mart tApime HAytujay; f= 1,2... (9.59) 
UnTn =ApyjTy1, n<o. 
Provided its existence, it is possible to obtain the general solution of (9.58): Let 
dj =—Ajnjt+ wikis f=9,1,... 
Then the system (9.58) simplifies to 
dy =0, 
d;—dj,=0, j=1,2,... 
dn-1 =0, n<ow., 


Starting with 7 = 0, one successively obtains 


A hist : 
nj =I a, 705 j=1,2,..50. (9.60) 
ix 7 
1) If < ~, then the stationary state probabilities satisfy the normalizing condition 
ea Njp= 1. 
Solving for m9 yields 
: -l 
nj Ke 
mo=|1+ D071 — | . (9.61) 
jelizl Mi 
2) Ifn =, then equation (9.61) shows that the convergence of the series 
wo J AG 
DYE le (9.62) 


is necessary for the existence of a stationary distribution. A sufficient condition for 
the convergence of this series is the existence of a positive integer NV such that 


Ni-l 


<a<1 foralli>N. (9.63) 
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Intuitively, this condition is not surprising: If the birth rates are greater than the cor- 
responding death rates, the process will drift to infinity with probability 1. But this 
exludes the existence of a stationary distribution of the process. For a proof of the 
following theorem see Karlin and Taylor (1981). 


Theorem 9.3 The convergence of the series (9.62) and the divergence of the series 
o J : 
yo (9.64) 


is sufficient for the existence of a stationary state distribution. The divergence of the 
series (9.64) is, moreover, sufficient for the existence of such a time-dependent solu- 
tion { po(4), pi(d), ...} of (9.47) which satisfies the normalizing condition (9.21). 


Example 9.14 (repairman problem) The repairman problem introduced in example 
9.8 is considered once more. However, it is now assumed that there are r mechanics 
for repairing failed machines, | <r<n. A failed machine can be attended only by 
one mechanic. (For a modification of this assumption see example 9.15.) All other 
assumptions as well as the notation are as in example 9.8. 


nr (n-1)r a (n-r)r Xr 


Omi met @ moss © 


Ll 2uU ru ru ru 


Figure 9.11 Transition graph of the general repairman problem 


Let X(t) denote the number of failed machines at time ¢. Then {X(4,¢2 0} is a birth 
and death process with state space Z = {0,1,...,}. Its transition rates are 


Aj=(n—fa, OSj<n, 


ell ins 0747 
By ru, r<j<n 


(Figure 9.11). Note that in this example the terminology ‘birth and death rates' does 
not reflect the technological situation. If the service rate p =A/u is introduced, 
formulas (9.57) and (9.58) yield the stationary state probabilities 


(") p/no; 1<j7sr 


n! Fags eee (9.65) 
ri rl (nj)! Pr To, FSyJsn 


ie E(\ ois $ — pi] 
OL aN . jortl rng P 
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Policy 1: n=10, r=2 Policy 2: n=5,r=1 


0 0.0341 0 0.1450 
1 0.1022 1 0.2175 
2 0.1379 2 0.2611 
3 0.1655 3 0.2350 
4 0.1737 4 0.1410 
5 0.1564 5 0.0004 
6 0.1173 

7 0.0704 

8 0.0316 

9 0.0095 
10 0.0014 


Table 9.1 Stationary state probabilities for example 9.14 


A practical application of the stationary state probabilities (9.65) is illustrated by a 
numerical example: Let n = 10, p=0.3, and r=2. The efficiencies of the following 
two maintenance policies will be compared: 


1) Both mechanics are in charge of the repair of any of the 10 machines. 


2) The mechanics are assigned 5 machines each for the repair of which they alone 
are responsible. 


Let Xn, be the random number of failed machines and Z,,, the random number of 
mechanics which are busy with repairing failed machines, dependent on the number 
n of machines and the number r of available mechanics. From Table 9.1, for policy 1, 


10, 
E(X192) = Lar 1,1 = 3.902 


E(Zj9.2) = 1-11 +209 T;,1 = 1.8296. 
For policy 2, 
E(X5,1) = Dj inj =2.011 
E(Zs,1) = 1-112 + Dir m2 = 0.855. 
Hence, when applying policy 2, the average number of failed machines out of 10 and 
the average number of busy mechanics out of 2 are 
2E(X51)=4.022 and 2£E(Zs51)= 1.710. 


Thus, on the one hand, the mean number of failed machines under policy 1 is smaller 
than under policy 2, and, on the other hand, the mechanics are less busy under policy 
2 than under policy 1. Hence, policy 1 should be preferred if no other relevant per- 
formance criteria have to be taken into account. O 
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Example 9.15 The repairman problem of example 9.14 is modified in the following 
way: The available maintenance capacity of r units (which need not necessarily be 
human) is always fully used for repairing failed machines. Thus, if only one machine 
has failed, then all r units are busy with repairing this machine. If several machines 
are down, the full maintenance capacity of r units is uniformly distributed to the fail- 
ed machines. This adaptation is repeated after each failure of a machine and after 
each completion of a repair. In this case, no machines have to wait for repair. 


If 7 machines have failed, then the repair rate of each failed machine is 

ru/j. 
Therefore, the death rates of the corresponding birth and death process are constant, 
i.e., they do not depend on the system state: 


a . 
a aa Tis eet 
The birth rates are the same as in example 9.14: 
Aj=(n-f/dr; f=O,1,.... 
Thus, the stationary state probabilities are according to (9.60) and (9.61): 


4-1 
po Bb aon (4) / 


__al AY? ave 


Comparing this result with the stationary state probabilities (9.65), it is apparent that 
in case r= 1 the uniform distribution of the repair capacity to the failed machines has 
no influence on the stationary state probabilities. This fact is not surprising, since in 
this case the available maintenance capacity of one unit (if required) is always fully 
used. 


Many of the results presented so far in section 9.6 are due to Kendall (1948). 


9.6.3.3  Nonhomogeneous Birth and Death Processes 


Up till now, chapter 9 has been restricted to homogeneous Markov chains. They are 
characterized by transition rates which do not depend on time. 


Nonhomogeneous Birth Processes 1) Nonhomogeneous Poisson process The most 
simple representative of a nonhomogeneous birth process is the nonhomogeneous 
Poisson process (page 274). Its birth rates are 


r() =A); 1=0,1,.... 


Thus, the process makes a transition from state i at time f to state i+ 1 in [¢, t+ Af] 
with probability A(A) At+ o(Ad). 
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2) Mixed Poisson process If certain conditions are fulfilled, mixed Poisson processes 
(section 7.2.3) belong to the class of nonhomogeneous birth processes. 


Lundberg (1964) proved that a birth process is a mixed Poisson process if and only 
if its birth rates 1,,(¢) have properties 


Vivi) = Ai(t) 


Equivalently, a pure birth process {X(f),¢20} with transition rates 1,(f) and with 
absolute state distribution 


(pit) = P(X) =i); i= 0,1,...5 
is a mixed Poisson process if and only if 
PAD = ROPiaOs = 152,05 
see also Grandel (1997). 


Ink, 
an M5201... 


t 


Nonhomogeneous Linear Birth and Death Process In generalizing the birth and 
death process of example 9.11, now a birth and death process {X(4), t= 0} is consid- 
ered which has transition rates 


AMM=ADI, wi(D=HOi; i=0,1,... 
and initial distribution 
Pp 1(0) = P(X(0) = 1) = 1. 


Thus, A() can be interpreted as the transition rate from state 1 into state 2 at time ¢, 
and (¢) is the transition rate from state | into the absorbing state 0 at time ¢. Accord- 
ing to (9.47), the absolute state probabilities p (t) satisfy 
po) =nOriO, 

PY) =F - VAD P10 -/ AO + WO) PD + G+ DUO PAO; f= 1,2, 

Hence, the corresponding z-transform M(t, z) of 
{pi =P(XO =i); 1=0,1,...5 

is given by the partial differential equation (9.50) with time-dependent A and pL: 


OM(t, z OM(t,z 
OME) yeti 0 (9.66) 
Ot Oz 
The corresponding characteristic differential equation is a differential equation of 
Riccati type with time-dependent coefficients (compare with (9.51)): 
dz 


= M2" + 1M) + uO] 2 - 


A property of this differential equation is that there exist functions ,(x); i= 1,2,3,4, 
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so that its general solution z = z(t) can be implicitly written in the form 
on 2010 92) 
93(t) -—z@a(d) ” 
Hence, for all differentiable functions g(-), the general solution of (9.66) has the form 


(201(t)- @2(2) ) 


M62)=8\ 9.) -z040) ) 


From this and the initial condition M(0,z) =z it follows that there exist two func- 
tions a(t) and b(A) so that 


wea = a(t) + [1 — a(t) — b(O]z 


1-Az (9.67) 
By expanding M(t,z) as a power series in z, 
Pott) =a), 
pi()=[l-a()][1-bM|[o]"!; i=1,2,.... (9.68) 


Inserting (9.67) in (9.66) and comparing the coefficients of z yields a system of differ- 
ential equations for a(t) and b(A) : 


(a’b—ab’) +b’! =2(1 -a) (1-5) 
a’ =(1—a)(1—d). 
The transformations 4 = 1-—a and B=1-—b5 simplify this system to 
B! =(u-A)B-pB? (9.69) 
A! =—AB. (9.70) 
The first differential equation is of Bernoulli type. Substituting in (9.69) 
y(t) = 1/B) 
gives a linear differential equation in y: 
y'+(U-Ay=u. (9.71) 


Since 
a(0) = b(0) = 0, 


y satisfies y(0) = 1. Hence the solution of (9.71) is 
y(t) = 0) fi e® u(x) dx + 1], 


where 
0) = [glues -AG)] dx. 
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From (9.70) and (9.71), 


Boe 


Therefore, the desired functions a and b are 


A! en 8 ey 
y 


| oo 
eer co) 


1 
bO=1-yy, £20. 


With a(¢) and 5(4) known, the one-dimensional probability distribution (9.68) of the 
nonhomogeneous birth and death process {X(A), t = 0} is completely characterized. In 
particular, the probability that the process is in the absorbing state 0 at time f¢ is 


fi e® u(x) dx 
1 rr RE. 
iP e®Ou(x) dx +1 
Hence, the process {X(f),t = 0} will reach state 0 with probability | if the integral 
Jp 2 nen dx (9.72) 
diverges as t > 0. A necessary condition for this is u(x) = A(x) for all x = 0. 
Let LZ denote the first passage time of the process with regard to state 0, i.e., 


L= inf {t, X(t) =0}. 
t 


Since state 0 is absorbing, it is justified to call Z the lifetime of the process. On 
condition that the integral (9.72) diverges as t > 00, L has distribution function 


FipO=PLsb)=po, t20. 
Mean value and variance of X(¢) are 
E(X(d) =e, (9.73) 
Var(X(t)) =e? OfF e®[A(x) + wQr)] de. (9.74) 
If the process {X(4),¢ = 0} starts at s = 2,3,..., 1.e., it has the initial distribution 
ps(0) = P(X(0) =s)=1 for ans =2,3.,... 
then the corresponding z-transform is 


_ (a +[1 -a(t) - (O12 \* 
M(t, z) = a ‘ 


In this case, mean value and variance of X(t) are simply obtained by multiplying 
(9.73) and (9.74), respectively, by s. 
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9.7 APPLICATIONS TO QUEUEING SYSTEMS 


9.7.1 Basic Concepts 


One of the most important applications of continuous-time Markov chains is stochas- 
tic modeling of service facilities. The basic situation is the following: Customers 
arrive at a service system (queueing system) according to a random point process. If 
all servers are busy, an arriving customer either waits for service or leaves the system 
without having been served. Otherwise, an available server takes care of the custom- 
er. After random service times customers leave the system. The arriving customers 
constitute the input (input flow, traffic, flow of demands) and the leaving customers 
the output (output flow) of the queueing system. A queueing system is called a Joss 
system if it has no waiting capacity for customers which do not find an available 
server on arriving at the system. These customers leave the system immediately after 
arrival and are said to be Jost. A waiting system has unlimited waiting capacity for 
those customers who do not immediately find an available server and are willing to 
wait any length of time for service. A waiting-loss system has only limited waiting 
capacity for customers. An arriving customer is lost if it finds all servers busy and the 
waiting capacity fully occupied. A single-server queueing system has only one server, 
whereas a multi-server queueing system has at least two servers. 'Customers' or 'ser- 
vers' need not be persons. 


input 


loss 


waiting service 


Figure 9.12 Scheme of a standard queueing system 


Supermarkets are simple examples of queueing systems. Their customers are served 
at checkout counters. Filling stations also can be thought of as queueing systems 
with petrol pumps being the servers. Even a car park has the typical features of a 
waiting system. In this case, the parking lots are the 'servers' and the 'service times' 
are generated by the customers themselves. An anti-aircraft battery is a queueing sys- 
tem in the sense that it 'serves' the enemy aircraft. During recent years the stochastic 
modeling of communication systems, in particular computer networks, has stimulated 
the application of standard queueing models and the creation of new, more sophistic- 
ated ones. But the investigation of queueing systems goes back to the Danish engi- 
neer A. K. Erlang in the early 1900s, when he was in charge of designing telephone 
exchanges to meet criteria such as 'what is the mean waiting time of a customer 
before being connected' or "how many lines (servers) are necessary to guarantee that 
with a given probability a customer can immediately be connected'? 


426 APPLIED PROBABILITY AND STOCHASTIC PROCESSES 


The repairman problem considered in example 9.14 also fits into the framework of a 
queueing system. The successive failing of machines generates an input flow and the 
mechanics are the servers. This example is distinguished by a particular feature: each 
demand (customer) is produced by one of a finite number 7 of different sources 
‘inside the system', namely by one of the n machines. Classes of queueing systems 
having this particular feature are called closed queueing systems. 


The global objective of queueing theory is to provide theoretical tools for the design 
and the quantitative analysis of service systems. Designing engineers of service sys- 
tems need to make sure that the required service can be reliably delivered at minimal 
expense, since managers of service systems do not want to 'employ' more servers than 
necessary for meeting given performance criteria. Important criteria are: 


1) The probability that an arriving customer finds an available server. 
2) The mean waiting time of a customer for service. 


3) The total sojourn time of a customer in the system. 


It is common practice to characterize the structure of standard queueing systems by 
Kendall's notation A/B/s/m. In this code, A characterizes the input and B the service, 
s is the number of servers, and waiting capacity is available for m customers. Using 
this notation, standard classes of queueing systems are: 


A=M (Markov): Customers arrive in accordance with a homogeneous Poisson pro- 
cess (Poisson input). 

A=GlI (general independent): Customers arrive in accordance with an ordinary 
renewal process (recurrent input). 


A=D (deterministic): The distances between the arrivals of neighbouring customers 
are constant (deterministic input). 


B=M (Markov) The service times are independent, identically distributed exponen- 
tial random variables. 


B=G (general) The service times are independent, identically distributed random 
variables with arbitrary probability distribution. 


For instance, M//M/1/0 is a loss system with Poisson input, one server, and exponen- 
tial service times. G//M/3/s0 is a waiting system with recurrent input, exponential 
service times, and 3 servers. For queueing systems with an infinite number of servers, 
no waiting capacity is necessary. Hence their code is A/B/o. 


In waiting systems and waiting-loss systems there are several ways of choosing wait- 
ing customers for service. These possibilities are called service disciplines (queueing 
disciplines). The most important ones are: 


1) FCPS (first come-first served) Waiting customers are served in accordance with 
the order of their arrival. This discipline is also called FIFO (first in-first out), al- 
though 'first in' does not necessarily imply ‘first out’. 
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2) LCFS (last come-first served) The customer, which arrived last, is served first. 
This discipline is also called LIFO (last in-first out). 


3) SIRO (service in random order) A server, when having finished with a customer, 
randomly picks one of the waiting customers for service. 


There is a close relationship between service disciplines and priority (queueing) sys- 
tems: In a priority system arriving customers have different priorities of being served. 
A customer with higher priority is served before a customer with lower priority, but 
no interruption of ongoing service takes place (head of the line priority discipline). 
When a customer with absolute priority arrives and finds all servers busy, then the 
service of a customer with lower priority has to be interrupted (preemptive priority 
discipline). 


System Parameter and Assumptions In this chapter, if not stated otherwise, the 
interarrival times of customers are assumed to be independent and identically distrib- 
uted as a random variable Y. The intensity of the input flow (mean number of arriving 
customers per unit time) is denoted as 4 and referred to as arrival rate or arrival in- 
tensity. The service times of all servers, if not stated otherwise, are assumed to be 
independent and identically distributed as a random variable Z. The service intensity 
or service rate of the servers is denoted as i, i.e. . is the mean number of customers 
served per unit time by a server. Hence, 


E(Y) = 1/A and E(Z) = 1/u. 
The traffic intensity of a queueing system is defined as the ratio 
p=A/p. 
Usually, the state, the system is in, is fully characterized by the number of customers 
X(t), which are in the system at time ¢ (waiting or being served). If the stochastic 
process {X(f), f= 0} has eventually become stationary, then we say the queuing sys- 


tem is in the steady state. When the system is in the steady state, then the time depend- 
ence of its characteristic parameters, in particular of the state probabilities 


PAD =f); 7=0,1,..5 


has levelled out; they are constant. This will happen afer a sufficiently long operating 
time. In this case, the probability distribution of X(f) does no longer depend on ¢ so 
that X(¢) is simply written as X. In this case, 


{tj = lim P(X(t) =/) = P(X=/J); j = 9, 1,...,5 +m, s,m < 0} 
00 


is the stationary probability distribution of {X(A, t= 0}. 


Let S denote the random number of busy servers in the steady state of the system. 
Then its degree of server utilization is 


7 = E(S)/s. 
The coefficient n can be interpreted as the mean proportion of time a server is busy. 
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9.7.2 Loss Systems 


9.7.2.1 M/M/o-System 


Strictly speaking, this system is neither a loss nor a waiting system. In this model, the 
stochastic process {X(f), t= 0} is a homogeneous birth-and death process with state 
space Z = {0, 1,...} and transition rates (see example 9.12) 


AHA; WW; =i; i=0,1,.... 
The corresponding time-dependent state probabilities p;(t) of this queueing system 
are given by (9.55). The stationary state probabilities are obtained by passing to the 
limit as ¢—> oo in these pj() or by inserting the transition rates A; = and pj; = ip 
with n =o into (9.60) and (9.61): 


n= Te; Pe eee (9.75) 


This is a Poisson distribution with parameter p = A/p. Hence, in the steady state the 
mean number of busy servers is equal to the traffic intensity of the system: E(X) = p. 


5.7.2.2 M/M/s/)-System 
In this case, {X(4), f= 0} is a birth and death process with Z = {0, 1,...,5} and 
A; =A; 1=0,1,...,s-1; 4;=0 forizs, 
u;=in; i=0,1,...,9. 

Inserting these transition rates into the stationary state probabilities (9.60) and (9.61) 
with n =s yields 

S| . vl ] 2 

To = oa ps > Tj =—p/ 19; j=0,1,...,5. (9.76) 

i=0 i! J! 

The probability m9 is called vacant probability. The loss probability, i.e., the proba- 


bility that an arriving customer does not find an idle server, and, hence, leaves the 
system immediately, is 


(9.77) 


This is the famous Erlang loss formula. The following recursive formula for the loss 
probability as a function of s can easily be verified: 


1 1 
mo =1 for s=0; qo=Se tls s=1,2.... 


The mean number of busy servers is 
s s p! s-l p! 
E(X)= Lin; => i— To =P x — To: 
i=l i=l] 1: i=0 
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Combining this result with (9.76) and (9.77) yields 


E(X) = p(1 - 1s). 
Hence, the degree of server utilization is 
p 
QH= 5 el = Ts) 7 


Single-Server Loss System In case s = 1 vacant and loss probability are 


"= Tis and Mao (9.78) 
Since p = E(Z)/E(Y), 
__ FY) __ AZ) 
"E+E ™ ™-E+EO 


Hence, mo (71) is formally equal to the stationary availability (nonavailability) of a 


system with mean lifetime E(Y) and mean renewal time E(Z) the operation of which 
is governed by an alternating renewal process (formula (7.14), page 322). 


Example 9.16 A ‘classical application' (no longer of practical relevance) of loss 
models of type M/M/s/0 is a telephone exchange. Assume that the input (calls of 
subscribers wishing to be connected) has intensity 2, = 2[min7!]. Thus, the mean time 
between successive calls is E(Y)=1/A=0.5 [min]. On average, each subscriber 
occupies a line for E(Z) = 1/u = 3 [min]. 


1) What is the loss probability in case of s = 7 lines? 
The corresponding traffic intensity is p = A/u = 6. Thus, the loss probability equals 


T 716! 0.185 
7= 2 ar G5 = G6. 467 ae 
6, 63, 64, 6 , 6° 6 
464574 ar tay ts t ort a 


Hence, the mean number of occupied lines is 
E(X) = p(. —- 27) = 6(1 — 0.185) = 4.89, 
and the degree of server (line) utilization is 
nN = (7) = 4.89/7 = 0.698. 


2) What is the minimal number of lines which have to be provided in order to make 
sure that at least 95% of the desired connections can be made? 


The respective loss probabilities for s = 9 ands = 10 are 
Tg = 0.075 and 119 = 0.043. 
Hence, the minimal number of lines required is s = 10. In this case, however, the 
degree of server utilization is smaller than with s = 7 lines: 
7 = (10) = 0.574. oO 
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It is interesting and practically important that the stationary state probabilities of the 
queueing system M/G/s/0 also have the structure (9.76). That is, if the respective 
traffic intensities of the systems M/M/s/0 and M/G/s/ are equal, then their stationary 
state probabilities coincide: for both systems they are given by (9.76). A correspond- 
ing result holds for the queueing systems M/M/co and M/G/co. (Compare the station- 
ary state probabilities (9.75) with the stationary state probabilities (7.37) (page 274) 
for the M/G/o-system.) Queueing systems having this property are said to be 
insensitive with respect to the probability distribution of the service. An analogous 
property can be defined with regard to the input. In view of (9.78), the M/M/1/0 
-system is insensitive both with regard to arrival and service time distributions ( full 
insensitiveness). A comprehensive investigation of the insensitiveness of queueing 
systems and other stochastic models is given in the handbook on queueing theory by 
Gnedenko, Konig (1983). 


9.7.2.3 Engset's Loss System 


Assume that 7 sources generate n independent Poisson inputs with common intensity 
2, which are served by s servers, s <n. The service times are independent, exponen- 
tially distributed random variables with parameter p. As long as a customer from a 
particular source is being served, this source cannot produce another customer. (Com- 
pare to the repairman problem, example 9.14: during the repair of a machine, this 
machine cannot produce another demand for repair.) A customer which does not find 
an available server is lost. Let X(¢) denote the number of customers being served at 
time ¢. Then {X(4), ¢ = 0} is a birth- and death process with state space Z = {0, 1,..., s}. 
In case X(f) =j only n —j sources are active, that is they are able to generate custom- 
ers. Therefore, the transition rates of this birth- and death process are 


Ap=(n-f)A; J=9,1,2,...,8—1, 
Hy =J HS Ja Mp2 gs. 


sources servers 


Figure 9.13 Engset's loss system in stateX(t)=/ 
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Inserting these transition rates into (9.60) and (9.61) with n =s yields the stationary 
state distribution for Engset's loss system 


To = > Us = fH0,1y.2 gs: 


Engset's loss system is just as the repairman problem considered in example 9.14, a 
closed queueing system. 


9.7.3 Waiting Systems 


9.7.3.1 M/M/s/o-System 


The Markov chain {X(f), t= 0}, which models this system, is defined as follows: If 
X(t) =7 with 0 <j <s, then j servers are busy at time ¢. If X() =j with s>/, then s 
servers are busy and j—s customers are waiting for service. In either case, X(f) is the 
total number of customers in the queueing system at time ¢. {X(f), t= 0} is a birth and 
death process with state space Z = {0, 1, ...} and transition rates 


AHA; f=0,1,..., 

wy =p forj=0,1,...,5; wp=sp forj>s. (9.79) 

In what follows it is assumed that 
p=Alu<s. 

If p > s, then the arrival intensity A of customers is greater than the maximum service 
rate us of the system so that, at least in the longrun, the system cannot cope with the 
input, and the length of the waiting queue will tend to infinity as too. Hence, no 
equilibrium (steady) state between arriving and leaving customers is possible. On the 
other hand, the condition is necessary and sufficient for the existence of a stationary 
state distribution, since in this case the corresponding series (9.62) converges and 
condition (9.63) is fulfilled. 
Inserting the transition rates (9.79) into (9.60) yields 


p/ 
aera for 7 =0,1,...,s—-1, 


p! for j> 9.80 
7 ea or j2s. (9.80) 
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The normalizing condition and the geometric series (formula (2.16), page 48) yields 
the vacant probability 2: 


s-l ; Ss vl 
To = x toi + See : 
= 


i-0 I! (s—1)!(s—p) 
The probability 7,, that an arriving customer finds all servers busy is 
Tw = Des Tj. 


Ty is called waiting probability, since it is the probability that an arriving customer 
must wait for service. Making again use of the geometric series yields a simple for- 
mula for Ty: 

_ Ts 

~ 1=pis’ 


Ty (9.81) 


In what follows, all derivations refer to the system in the steady state. If S denotes 
the random number of busy servers, then its mean value is 


E(S) = Dip in; +s mw. (9.82) 

From this, 
E(S) =p. (9.83) 
(The details of the derivation of (9.83) are left as an exercise to the reader.) Also 
without proof: Formula (9.83) holds for any G//G/s/o -system. Hence the degree of 


server utilization in the M/M/s/co-system is n= p/s. By making use of (9.83), the 
mean value of the total number X of customers in the system is seen to be 


E()= Dei inj=p] 4 ns} (9.84) 


Let ZL denote the random number of customers waiting for service (queue length). 
Then the mean queue length is 


E(L) = Diesli —S)Tj = payee it; —STy. 
Combining this formula with (9.82)— (9.84) yields 
ps 


ELy= 
(L) ea: 


at (9.85) 


Waiting Time Distribution Let W be the random time a customer has to wait for 
service if the service discipline F'CFS is in effect. By the total probability rule 


P(W> t)=d%, P(W > t|X =i) 7;. (9.86) 
If a customer enters the system when it is in state X=i> 5, then all servers are busy 
so that the current output is a Poisson process with intensity su. The random event 


'‘W>t' occurs if within ¢ time units after the arrival of a customer the service of at 
most i—s customers has been finished. Therefore, the probability that the service of 


9 CONTINUOUS-TIME MARKOV PROCESSES 433 


precisely & customers, 0 < k <i-—s, will be finished in this interval of length ¢ is 


nt 


Hence, 


its k 
P(W>t|X=i)=e SD ind) 
k=0 
and, by (9.86) 


co i-; k os 5 - k 
P(W > th= et 2; > (spd) = ape sHt p" ics (10) | 
i=s k= i=s s!s™S x0! 


By performing the index transformation j=i-—s, changing the order of summation 
according to formula (2.115), page 99, and making use of both the exponential series 
and the geometric series (page 48) yield 
i £ (spo* 

k=0 A} 


Ss foe) 
PW>t= mh e it 2 (2) 
! j=0 


-sut & Gud* Zp) 
=Tse SH! >! De 
eS ane 2 


_ 00 at)* 2p i z 1 
a Sut Qo" on Spt pAt 
Tse = ‘A = (3) Tse e ioe 


Hence, the distribution function of W is 
Fy@O=PWSt)=1- si onse HORM, 120. 
Note that P(W > 0) is the waiting probability (9.81): 
Ty =P(W> 0) =1-Fy(0) = ap Ms. 


s— 


The mean waiting time of a customer is 


E(W) =|) PW > t)dt= —*— ns. (9.87) 
u(s —p) 
A comparison of (9.85) and (9.87) yields Little's formula or Little's law: 
E(L)=XE(W). (9.88) 


Little's formula can be motivated as follows: The mean value of the sum of the wait- 
ing times arising in an interval of length t is tE(LZ). On the other hand, the same 
mean value is given by AtE(W), since the mean number of customers arriving in an 
interval of length t is At. Hence, 


TtE(L) =ATE(W), 


which is Little's formula. 
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With E(X) given by (9.84), an equivalent representation of Little's formula is 
E(X) = E(7), (9.89) 


where 7 is the total sojourn time of a customer in the system, i.e., waiting plus 
service time 7 = W+Z. Hence, the mean value of T is 


E(T) = E(W) + Iu. 


Little's formula holds for any Gi/G/s/oo—system. For a proof of this proposition and 
other 'Little type formulas' see Franken et al. (1981). 


9.7.3.2, M/G/I/o-System 


In this single-server system, the service time Z is assumed to have an arbitrary proba- 
bility density g(¢) and a finite mean E(Z) = 1/u. Hence, the corresponding stochastic 
process {X(f), f= 0} describing the development in time of the number of customers 
in the system needs no longer be a homogeneous Markov chain as in the previous 
queuing models. However, there exists an embedded homogeneous discrete-time 
Markov chain, which can be used to analyze this system (see section 9.4). 


The system starts operating at time ¢= 0. Customers arrive according to a homogen- 
eous Poisson process with positive intensity 2. Let 4 be the random number of cus- 
tomers, which arrive whilst a customer is being served, and 


{a;=P(A=i1); i=0,1,...} 


be its probability distribution. To determine the a;, note that the conditional proba- 
bility that during a service time of length Z = t exactly i new customers arrive is 


(Ai! ne 
i! ; 
Hence, 


© i 
aj= i) EO eMe(iydt, i=0,1,.... 
woe! 
This and the exponential series (page 48) yield the z-transform M 4(z) of A: 
M4(z)= > a;z' = i) e AA)! ot) dt. 
i=0 0 
Consequently, if g(-) denotes the Laplace transform of g(f), then 
M 4(z) = 2(A-Az). (9.90) 
By formula (2.112) (page 96), letting as usual p = A/u, the mean value of A is 


dMg(2))_) 4 dite 
de a=" 


i= dr 


=p. (9.91) 


r=0 
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Embedded Markov Chain Let 7, be the random time point at which the nth cus- 
tomer leaves the system. If X;, denotes the number of customers in the system im- 
mediately after Ty, then {X1,X>,...} is a homogeneous, discrete-time Markov chain 
with state space Z = {0, 1,...} and one-step transition probabilities 


aj ifi=0 and j=0,1,2,... 
Pij =PXnr1 =j|Xn =i) = Ajit] if i-l<j and b= 1,255, (9.92) 
0 otherwise 


for all n = 0, 1,...; Xq =0. This Markov chain is embedded in {X(#), t > 0} since 
Xn =X(Tn +0); n=0,1,.... 


The discrete-time Markov chain {X09,X1,...} is irreducible and aperiodic. Hence, on 
condition p =A/u <1 it has a stationary state distribution {7,7),...}, which can be 
obtained by solving the corresponding system of algebraic equations (8.9) (see page 
342): Inserting the transition probabilities p;; given by (9.92) into (8.9) gives 


To =ao(to +71), 
jt s 
Tj = TQ aj+ Liat Tj j-i415 pollen (9.93) 

Let My(z) be the z-transform of the state X of the system in the steady state: 

Mx(z) = Dj-9 12z/. 
Then, multiplying (9.93) by z/ and summing up from j = 0 to 00 yields 

ee) a; (eo) ; J+) 

Myx(z) = Td j=0 ajzi + Xj-0 ZI Viet Tj j—-i41 
=m M4(z)+M4(2) 2 mz ay iy 
i= 


<M 2o—™. 


Solving this equation for My(z) yields 


> 1-z 
M xz) = %)M4(2) Wo Iz| <1. (9.94) 
To determine zo, note that 
M,(0)= My(1)=1 


and 


ee ee (1 (Mae T) @ i mn | =1-p. 


Pani l-z zt l-z 
Therefore, by letting z T 1 in (9.94), 
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To =1-p. (9.95) 
Combining (9.90), (9.94), and (9.95) yields the Formula of Pollaczek-Khinchin: 


Mx(z) =(1-p) lnz so lel ee (9.96) 


~ (A 22) 


According to its derivation, this formula gives the z-transform of the stationary dis- 
tribution of the random number X of customers in the system immediately after the 
completion of a customer's service. In view of the homogeneous Poisson input, it is 
even the stationary probability distribution of the ‘original’ Markov chain {X(0), t = 0} 
itself. Thus, X is the random number of customers at the system in its steady state. Its 
probability distribution {29,71,...} exists and is solution of (9.93). Hence, numer- 
ical parameters as mean value and variance of the number of customers in the system 
in the steady state can be determined by (9.96) via formulas (2.112), page 96. For 
instance, the mean number of customers in the system is 


_ dM x(2), lane M[(E(Z))* + Var(Z)] 
Sigg eS? 2(1=p) 


E(X) (9.97) 


Sojourn Time Let 7 be the time a customer spends in the system (sojourn time) if 
the FCFS-queueing discipline is in effect. Then 7 has structure 


T=W+Z, 
where W is the time a customer has to wait for service (waiting time). Let F'7(4) and 
Fy() be the respective distribution functions of T and W and f7(4) and fy(d) the 


corresponding densities with Laplace transforms f7(r) and fyw(r). Since W and Z 
are independent, 


fro) =fwOs. (9.98) 


The number of customers in the system after the departure of a served one is equal to 
the number of customers which arrived during the sojourn time of this customer. 
Hence, analogously to the structure of the a;, the probabilities 1; are given by 


ee i 
m= | eM fp (oat; i=0,1,.... 
0 . 


The corresponding z-transform My(z) of X or, equivalently, the z-transform of the 
stationary distribution {09,71,...} 1s (compare to the derivation of (9.90)) 


My(z) =f r(A—-Az). 
Thus, by (9.98), 


My(2) = fw(h-22) 8-22). 
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This formula and (9.96) yields the Laplace transform of fy(r): 


fw) =~) 


By formulas (2.62) and (2.119), E(W) and Var(W) can be determined from fw): 


A [E(Z)? + Var(Z)) 
E(W) = 2(1-p) ; (9.99) 
1? [(E(Z)? + Var(Z)?_, 2E(Z*) 
4(1-p)? 3(1-p) 
The random number of busy servers S has the stationary distribution 
P(S=0)=a9=1-p, P(S=l)=1-m9=p 


Var(W) = 


so that E(S) =p. 
The queue length is L = X— S. Hence, by (9.97), 
MP [(E(Z)* + Var(Z)] 
E(L) = . 9.100 
(L) ery (9.100) 
Comparing (9.99) and (9.100) verifies Little's formula (9.88): 
E(L) =X EW). 


Example 9.17 The use of the formula of Pollaczek-Khinchin is illustrated by assum- 
ing that Z has an exponential distribution: 


e(t)=pe", t>0. 
By example 2.26 (page 101), the Laplace transform of g(t) is 


2(r) = aan so that 9(A—Az) = 8(A(1 -2)) = ae 


Inserting this in (9.96) gives 


1-z 
My(z)=(1- — 

x{Z) = ( OL aac 
tee pee 
= ee ara Saeah Gate 

so that by the exponential series (2.19) (page 48), 


i 


Mx@)= (1p) ao = LU pyipet 


Hence, by the exponential series (2.19) (page 48), 
Bias 
pi=(-p)3 i=0,1... 


This confirms the result (9.80) for the M/M/s/co-system with s = 1. oO 
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9.7.3.3, GI/M/1/o-System 


In this single-server system, the interarrival times are given by an ordinary renewal 
process {Y1, Y,...}, where the Y; are identically distributed as Y with probability 
density fy(4) and finite mean value E(Y)=1/A. The service times are identically 
exponentially distributed with parameter . A customer leaves the system immediate- 
ly after completion of its service. If an arriving customer finds the server busy, it 
joins the queue. The stochastic process {X(A), t= 0}, describing the development of 
the number of customers in the system in time, needs not be a homogeneous Markov 
chain. However, as in the previous section, an embedded homogeneous discrete-time 
Markov chain can be identified: The th customer arrives at time 


Tn = diel Y;; n=1,2,... 


Let X, denote the number of customers in the station immediately before arrival of 
the (n+ 1)th customer (being served or waiting). Then, 0 < X, <n, n=0,1,... The 
discrete-time stochastic process {X0,X1,...} is a Markov chain with parameter space 
T= {0,1,...} and state space Z = {0, 1,...}. Given that the system starts operating at 
time ¢ = 0, the initial distribution of this discrete-time Markov chain is P(X9 = 0) = 1. 


For obtaining the transition probabilities of {X9,X1,...}, let Dn be the number of cus- 
tomers leaving the station in the interval [Tn, 7,1) of length Y,,,,. Then, 


Xn =X)-] —Dy + 1 with 0 <Dn <Xy > A= 1,2,.... 
By theorem 7.2, on condition Y,,; =¢, the random variable Dy has a Poisson distri- 


bution with parameter ut if the server is busy throughout the interval [Tn, 7,41). 
Hence, fori>0 and1<j<i+l, 


5 (u jy ay 


POG ea ee a= (Gi+1—/! 


; n=1,2,.... 


Consequently the one-step transition probabilities 
pip = P(Xn =f\Xn-1 =i; if € Zs; n=1,2,... 


of the Markov chain {Xo,X1,...} are 
 uatld 
7 o @tl=))! 


The normalizing condition yields 


Pij e  fy(t) dt; l<jsit+l. 


Pio =1 ~ Dt Dy: 
The transition probabilities p;; do not depend on n so that {Xo,Xj,...} is a homo- 
geneous Markov chain. It is embedded in the original state process {X(4), t= 0} since 
Xn = X(Tni1 -0); 2=0,1,.... 


Based on the embedded Markov chain {Xo,X}1,...}, a detailed analysis of the queue- 
ing system G//M/1/c can be carried out analogously to the one of system M/G/1/o0. 
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9.7.4 Waiting-Loss Systems 


9.7.4.1 M/M/s/m-System 


This system has s servers and waiting capacity for m customers, m = 1. A potential 
customer, which at arrival finds no idle server and the waiting capacity occupied, is 
lost, that is such a customer leaves the system immediately after arrival. 


The number of customers X(t) in the system at time ¢ generates a birth- and death 
process {X(f), f= 0} with state space Z = {0, 1,...,s+m} and transition rates 


A=, O0<j<s+m-], 


ge ju for l<j<s, 
4 su for s<j<st+m. 


According to (9.60) and (9.61), the stationary state probabilities are 


71P! m0 for 1<j<s-l, 
Ls 1 


ap! To for s<j<s+m. 
SIS” 


The second series in tg can be summed up to obtain 


=| 
s—l ‘ 2 m+ 
s Lois dps Hee | for p#s, 
jo! s! p/s 
v5 = 
0 s—l eee e vl 

Xo pl + (m+ 1) for p=s. 


The vacant probability mo is the probability that there is no customer in the system 
and 154m is the loss probability, i.e., the probability that an arriving customer is lost 
(rejected). The respective probabilities t+ and my that an arriving customer finds a 


free (idle) server or waits for service are 


s-l s+m-—1 
Tr= UT, Rw= L Ty. 
i=0 i=s 


Analogously to the loss system M/M/s/0, the mean number of busy servers is 
E(S) = p(1 —Ts4m). 

Thus, the degree of server utilization is 
N= p(1—-Tsim)/s. 
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In the following example, the probabilities tg and ts:m, which refer to a queueing 
system with s servers and waiting capacity for m customers, are denoted as 


To(s,m) and Ts+m(s,m), 


respectively. 


Example 9.18 A filling station has s = 8 petrol pumps and waiting capacity for m = 6 
cars. On average, 1.2 cars arrive at the filling station per minute. The mean time a car 
occupies a petrol pump is 5 minutes. It is assumed that the filling station behaves like 
an M/M/s/m-queueing system. Since A = 1.2 and u= 0.2, the traffic intensity is p = 6. 
The corresponding loss probability 114 = 7 14(8, 6) is 


__1l ¢14 _ 
14(8,6) = — 6!4 29(8, 6) = 0.0167. 


From the normalizing condition, 


1 allag 
Asse lag. alee =O I8y 
HOS) -|3 he *3° 1-6r 


= 0.00225. 
Consequently, the average number of occupied petrol pumps is 
E(S) = 6- (1 -0.0167) = 5.9. 


After having obtained these figures, the owner of the filling station considers 2 from 
the 8 petrol pumps superfluous and has them pulled down. It is assumed that this 
change does not influence the input flow so that cars continue to arrive with traffic 
intensity p = 6. The corresponding loss probability 1,2 = 1 12(6,6) becomes 


6 
1 12(6,6) = aI n9(6,6) = 0.1023. 


Thus, about 10% of all arriving cars leave the station without having filled up. To 
counter this drop, the owner provides waiting capacity for another 4 cars so that 
m= 10. The corresponding loss probability 116 = 716(6, 10) is 


6 
116(6, 10) = c m9(6, 10) = 0.0726. 


Formula 
-l 


ale ae ee 6° 
T+m(6,m) = 21a! Re I) ee 


yields that additional waiting capacity for 51 cars has to be provided to equalize the 
loss caused by reducing the number of pumps from 8 to 6. So, the decision of the 
owner to pull down two of the pumps was surely not helpful. O 
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9.7.4.2 M/M/s/o-System with Impatient Customers 


Even if there is waiting capacity for arbitrarily many customers, some customers 
might leave the system without having been served. This happens when customers 
can only spend a finite time, their patience time, in the queue. If the service of a cus- 
tomer does not begin before its patience time expires, the customer leaves the system. 
For example, if somebody, whose long-distance train will depart in 10 minutes, has 
to wait 15 minutes to buy a ticket, then this person will leave the counter without a 
ticket. Real time monitoring and control systems have memories for data to be 
processed. But these data 'wait' only as long as they are up to date. Bounded waiting 
times are also typical for packed switching systems, for instance in computer-aided 
booking systems. Generally one expects that ‘intelligent’ customers adopt their behav- 
ior to the actual state of the queueing system. Of the many available models dealing 
with such situations, the following one is considered in some detail: 


Customers arriving at an M/M/s/co-system have independent, exponentially with pa- 
rameter v distributed patience times. If X(#) as usual denotes the number of customers 
in the system at time ¢, then {X(d), £20} is a birth and death process with transition 
rates 


Aja; f=0,1,..., 


oe ju for j=1,2,...,5, 
J | sp+(QG-s)v for j=s,s+1,... 


If j > », then 1; > 0, whereas the birth rate remains constant. Hence the sufficient 
condition for the existence of a stationary distribution stated in theorem 9.3 (page 419) 
is fulfilled. Once the queue length exceeds a certain level, the number of customers 
leaving the system is on average greater than the number of arriving customers per 
unit time. That is, the system is self-regulating, aiming at reaching the equilibrium 
state. Now formulas (9.60) and (9.61) yield the corresponding stationary state proba- 
bilities: 


71P! m0 for j=1,2,...,s 
2 p* pis bc. 
j= i= ar for j=st+l1,.s+2,... 
TI@ptiv) 


-1 


S Ss co. ps 

~ 1 7,P ris 
si fa (Pr i a > 

jos" S jastl J 


Let LZ denote the random length of the queue in the steady state. Then, 
E(L) = Desi G5) m7. 
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Inserting the 7; yields after some algebra 
o [i = 
E(L)=1s Xj [Hous 
j=l i=l 


In this model, the Joss probability my is not strictly associated with the number of cus- 
tomers in the system. It is the probability that a customer leaves the system without 
having been served, because its patience time has expired. Therefore, 1-7, is the 
probability that a customer leaves the system after having been served. By applying 
the total probability rule with the exhaustive and mutually exclusive set of random 
events 'Y=j'; j=s,s5+1,..., one obtains 


E(L) = * ny. 


Thus, the mean queue length is directly proportional to the loss probability (com- 
pare to Little's formula (9.88)). 


Variable Arrival Intensity Finite waiting capacities and patience times imply that 
in the end only a 'thinned flow’ of potential customers will be served. Thus, it seems 
to be appropriate to investigate queueing systems, whose arrival (input) intensities 
depend on the state of the system. Those customers, however, which actually enter 
the system do not leave it without service. Since the tendency of customers to leave 
the system immediately after arrival increases with the number of customers in the 
system, the birth rates should decrease for j = s as j tends to infinity. This property 
have, for example, for a = 0 the birth rates 


nr for 7=0,1,...,s—-1, 
Ge 4 for j=s,s+1.,... 
Jta 


9.7.5 Special Single-Server Queueing Systems 


9.7.5.1 System with Priorities 


A single-server queueing system with waiting capacity for m= 1 customer is subject 
to two independent Poisson inputs 1 and 2 with respective intensities 4; and. The 
corresponding customers are called type 1- and type 2-customers. Type 1-customers 
have absolute (preemptive) priority, i.e. when a type 1- and a type 2-customer are in 
the system, the type 1-customer is being served. Thus, the service of a type 2-custom- 
er is interrupted as soon as a type l-customer arrives. The displaced customer will 
occupy the waiting facility if it is empty. Otherwise it leaves the system. A waiting 
type 2-customer also has to leave the system when a type 1-customer arrives, since 
the newcomer will occupy the waiting facility. (Such a situation can only happen 
when a type l-customer is being served.) An arriving type 1-customer is lost only 
then when both server and waiting facility are occupied by other type 1-customers. 
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Figure 9.14 Transition graph for a single-server priority queueing system withm = 1 


Thus, if only the number of type 1-customers in the system is of interest, then this 
priority queueing system becomes the waiting-loss system M/M/s/1 with s = 1, since 
type 2-customers have no impact on the service of type 1-customers at all. The service 
times of type 1- and type 2-customers are assumed to have exponential distributions 
with respective parameters 1; and Uy. The state space of the system is represented 
in the form 


Z={G/); i7=0,1,2}, 


where i denotes the number of type 1-customers and j the number of type 2-custom- 
ers in the system. Note that if X(f) denotes the system state at time ¢, the stochastic 
process {X(f), t= 0} can be treated as a one-dimensional Markov chain, since scalars 
can be assigned to the six possible system states, which are given as two-component 
vectors. The Markov chain {X(A), f= 0} is, however, not a birth- and death process. 
Figure 9.14 shows its transition graph. 


According to (9.28), the stationary state probabilities satisfy the system of equations 
(A; +42) (0,0) = H1% (1,0) + H2%0,1) 
(A +A2 +1) @ (1,0) =412%0,0) + H1%(2,0) 
(Ay +A2 + M2) (0,1) = 427 (0,0) + H1 7% (1,1) + H2 70,2) 
(Ay +1) @ (1,1) = 42% (1,0) +417 (0,1) +41 70,2) 
M1 2,0) =41% (1,0) +417 (1,1) 
(A; +H2) %(0,2) = 427 (0,1) 

(0,0) + (1,0) ae (0,1) ab T(1,1) + (2,0) + 10,2) = 1. 

m =0 Since there is no waiting capacity, each customer, notwithstanding its type, is 


lost if the server is busy with a type 1-customer. In addition, a type 2-customer is lost 
if, while being served, a type 1-customer arrives. The state space is 


Z= {(0,0), (0, 1), (1,0)}. 
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Figure 9.15 Transition graph for a 1-server priority loss system 


Figure 9.15 shows the transition rates. The corresponding system (9.27) for the sta- 
tionary state probabilities is 


(A; +42) (0,0) = H1% (1,0) + #2 % 0,1) 
M1 (1,0) =41% (0,0) +417(0,1) 
l= ™ (0,0) + (1,0) + ™ (0,1): 
The solution is 
wi (A) + He) 
(Ay + pi )(Ay +A2 + p2)’ 


A2 MI Ses peel 
(yt+Hi(ar+aAr+pr)? OO Aye py 


70,0) = 


%(0,1) = 


71,0) 18 the loss probability for type 1-customers. It is simply the probability that the 
service time of type 1-customers is greater than their interarrival time. On condition 
that at the arrival time of a type 2-customer the server is idle, this customer is lost if 
and only if during its service a type 1-customer arrives. The conditional probability 
of this event is 


co, = co r 
ip e Hale Mdt= rfp e Quitha) gt = ny ! 


1+ HQ, 
Therefore, the (total) loss probability for type 2-customers is 
Ay 
tj) = 7—— ft +7 +710): 
ES Fae POD) 


Example 9.19 Let A; =0.1, 42 =0.2, and fy =. =0.2. Then the stationary state 
probabilities are 


70,0) = 9.2105, (0,1) = 0.3073, (1,0) = 0.0085, 
(1,1) = 0.1765, (9,2) = 0.2048, m2,9) = 0.0924. 
In case m = 0, with the same numerical values for the transition rates, 


70,0) = 0.4000, (1,0) = 0.3333, (0,1) = 0.2667. 


The loss probability for type 2-customers is 1; = 0.7333. O 
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9.7.5.2 M/M/1/m-System with Unreliable Server 


If the implications of server failures on the system performance are not negligible, 
server failures have to be taken into account when building up a mathematical model. 
Henceforth, the principal approach is illustrated by a single-server queuing system 
with waiting capacity for m customers, Poisson input, and independent, identically 
distributed exponential service times with parameter p. The lifetime of the server is 
assumed to have an exponential distribution with parameter o, both in its busy phase 
and in its idle phase, and the subsequent renewal time of the server is assumed to be 
exponentially distributed with parameter B . It is further assumed that the sequence of 
life- and renewal times of the server can be described by an alternating renewal pro- 
cess. When the server fails, all customers leave the system, i.e., the customer being 
served and the waiting customers if there are any are lost. Customers arriving during 
a renewal phase of the server are rejected, i.e., they are lost, too. 

The stochastic process {X(A), t= 0} describing the behaviour of the system is charac- 
terized as follows: 


X() = j if there are j customers inthe system at time t; j=0,1,... »m +1 
m+2 if the server is being renewed at time ¢ 
Its transition rates are (Figure 9.16): 
qjjui =A; f=O0,1,...,m 
gj j-1 = J=1,2,...,m+1 (9.101) 
Gjmi2 =; f=0,1,..m+1 
Ym+2,0 =B. 
By (9.28), the stationary state probabilities satisfy the system of equations 


(Q+A)t9 =H, +B m42 
(a+A+ pa; =AT;y + UN; f=1,2,...,m (9.102) 
(A+) ing) =A Tm 


Bmi2 =ANQ+AM, +-°° + AN. 


Figure 9.16 Transition graph of a queueing system with unreliable server 
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The last equation is equivalent to B ty42 =a(1 —142). Hence, 
a 
Tint2=——7- 
m+2 Oo. +B 


Now, starting with the first equation in (9.102), the stationary state probabilities of 
the system 11,72,...,%m41 can be successively determined. The probability m9 is as 


usual obtained from the normalizing condition 
m+2 


Yi-0 1; = 1. (9.103) 
For the corresponding loss system (m = 0), the stationary state probabilities are 


__ Btw iA ae 
~ (at+P)\(AtA+p) ’ 1 (at+BP\atr+p)’ Tp 


TO 


Modification of the Model It makes sense to assume that the server can only fail if 
it is busy. In this case, 


Gjms2 = for j=1,2,...,m+1. 


The other transition rates given by (9.101) remain valid. Thus, the corresponding 
transition graph is again given by Figure 9.16 with the arrow from node 0 to node 
m+2 deleted. The stationary state probabilities satisfy the system of equations 


AT) =H) +PIm+2 
(A+A+ py) a; =AT-]+UML; f=1,2,...,m (9.104) 
(A+) Ring] =ATm 


B42 = OAM, +N +--+ +ON p41. 


The last equation is equivalent to Bry129 =a(1—m9 —Tm42). It follows 
Nint2 = Pea (1-19). 


Starting with the first equation in (9.104), the solution m9,71,72,...,2m+1 can be 
obtained as above. In case m = 0 the stationary state probabilities are 


np =_—Patw ss, 5B pee ae Ae 

Ba +p) + AC +B)” Bla + w) + ACa +B)” B(a +) + A(a +B) 
Comment It is interesting that this queueing system with unreliable server can be 
interpreted as a queueing system with priorities and absolutely reliable server. To see 
this, a failure of the server has to be declared as the arrival of a 'customer' with abso- 
lute priority. The service provided to this 'customer' consists in the renewal of the ser- 
ver. Such a ‘customer’ pushes away any other customer from the server, in this model 
even from the waiting facility. Hence it is not surprising that the theory of queueing 
systems with priorities also provides solutions for more complicated queuing systems 
with unreliable servers than the one considered in this section. 
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9.7.6 Networks of Queueing Systems 


9.7.6.1 Introduction 


Customers frequently need several kinds of service so that, after leaving one service 
station, they have to visit one or more other service stations in a fixed or random 
order. Each of these service stations is assumed to behave like the basic queueing 
system sketched in Figure 9.12. A set of queueing systems together with rules of their 
interactions is called a network of queueing systems or a queueing network. Typical 
examples are technological processes for manufacturing (semi-) finished products. In 
such a case the order of service by different queueing systems is usually fixed. Queu- 
ing systems are frequently subject to several inputs, i.e., customers with different ser- 
vice requirements have to be attended. In this case they may visit the service stations 
in different orders. Examples of such situations are computer and communication 
networks. Depending on whether and how data are to be provided, processed, or 
transmitted, the terminals (service stations) will be used in different orders. If techni- 
cal systems have to be repaired, then, depending on the nature and the extent of the 
damage, service by different production departments within a workshop is needed. 
Transport and loading systems also fit into the scheme of queueing networks. 


Using a concept from graph theory, the service stations of a queueing network are 
called nodes. In an open queueing network customers arrive from 'outside' at the sys- 
tem (external input). Each node may have its own external input. Once in the system, 
customers visit other nodes in a deterministic or random order before leaving the 
network. Thus, in an open network, each node may have to serve external and inter- 
nal customers, where internal customers are the ones which arrive from other nodes. 
In closed queueing networks there are no external inputs into the nodes, and the total 
number of customers in the network is constant. Consequently, no customer departs 
from the network. Queueing networks can be represented by directed graphs. The 
directed edges between the nodes symbolize the possible transitions of customers 
from one node to another. The nodes in the network are denoted by 1, 2,...,n. Node i 
is assumed to have s; servers; | <5; <o. 


9.7.6.2 Open Queueing Networks 


A mathematically exact analysis of queueing systems becomes extremely difficult or 
even impossible when dropping the assumptions of Poisson input and/or exponential- 
ly distributed service times. Hence, this section is restricted to a rather simple class 
of queueing networks, the Jackson queueing networks. They are characterized by four 
properties: 

1) Each node has an unbounded waiting capacity. 


2) The service times of all servers at node i are independent, identically distributed 
exponential random variables with parameter (intensity) u;. They are also independ- 
ent of the service times at other nodes. 
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3) External customers arrive at node i in accordance with a homogeneous Poisson 
process with intensity 1;. All external inputs are independent of each other and of all 
service times. 


4) When the service of a customer at node i has been finished, the customer makes a 
transition to node j with probability p;; or leaves the network with probability a;. 
The transition or routing matrix 

P=((p;;)) 


is independent of the current state of the network and of its past. 


Let I be the identity matrix. The matrix I— P is assumed to be nonsingular so that the 
inverse matrix (I—P)~! exists. According to the definition of the a; and p; js 


In a Jackson queueing network, each node is principally subjected to both external 
and internal input. Let «; be the total input (arrival) intensity at node j. In the steady 


state, a; must be equal to the total output intensity from node j. The portion of inter- 
nal input intensity to node j, which is due to customers from node /, is a; p;;. Thus, 


n 
Die hy Pij 
is the total internal input intensity to node 7. Consequently, in the steady state, 
aj=Aj+ Der apis f=l2,...,n. (9.106) 


By introducing vectors 
Q@=(01,02,...,0n) and A=(A],A9,...,An), 


the relationship (9.106) can be written as 
a(I-P)=A. 
Since I— P is assumed to be nonsingular, the vector of the total input intensities a is 
a=A(I—P)!, (9.107) 
Even under the assumptions stated, the total inputs at the nodes and the outputs from 
the nodes are generally nonhomogeneous Poisson processes. 
Let X;,(t) be the random number of customers at node i at time f¢. Its realizations are 
denoted as x;; x; =0,1,.... The random state of the network at time ¢ is characterized 
by the vector X(t) = (X1(0, X2(, ..., Xn(O) with realizations x = (x1,x2,...,X%n). The 
set of all these vectors x forms the state space of the Markov chain {X(#), t= 0}. 


Using set-theory notation, the state space is denoted as Z= {0,1,...}”, i.e., Z is the 
set of all those n-dimensional vectors the components of which assume nonnegative 
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integers. Since Z is countably infinite, this at first glance n-dimensional Markov 
chain becomes one-dimensional by arranging the states as a sequence. 


To determine the transition rates of {X(#),t20}, the n-dimensional vector e; is 
introduced. Its ith component is a | and the other components are zeros: 


e; = (0,0,...,0, 1,0,...,0). (9.108) 
De wit - 45> en 


Thus, e; is the ith row of the identity matrix I. Since the components of any state 


vector x are nonnegative integers, each x can be represented as a linear combination 
of all or some of the e},e9,...,en. In particular, x +e; (x—e;) is the vector which 


arises from x by increasing (decreasing) the ith component by 1. Starting from state 
x, the Markov chain {X(A), t= 0} can make the following one-step transitions: 


1) When a customer arrives at node i, the Markov chain makes a transition to state 
X+e€;. 


2) When a service at node i is finished, x; > 0, and the served customer leaves the 
network, the Markov chain makes a transition to state x—e;. 


3) When a service at node i with x; >0 is finished and the served customer leaves 
node i for node /, the Markov chain makes a transition to state x—e;+e His 


Therefore, starting from state x = (x1,X2,...,%n), the transition rates are 
qdx,xte; = hj 
x,x-e; = Min(X;,8;) Hj 4; 
x,x-ej+e; = MIN(X;,8;) Wj Pij, FJ. 

In view of (9.105), 

x piy=l-pi- a. 

J J#l 
Hence, the rate of leaving state x is 

qx = Ley gt Diet WiC — py) min(x;,5)). 
According to (9.28), the stationary state probabilities 
mtx = lim P(X(=x), x eZ, 

t->00 

provided they exist, satisfy the system of equations 
Qx Tx = Ljny AjTx-e, + Lj-1 4; My min(x; + 1,s;) Txte, 


+ Dj Ler aj by; minx; + 1,5;)p jj Txte;-e;- (9.109) 
iAj 
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In order to be able to present the solution of this system in a convenient form, recall 
that the stationary state probabilities of the waiting system M/M/s;/« with parame- 
ters a;, U;, and p;=a/u; denoting in this order the intensity of the Poisson input, 


the service intensities of all servers, and the traffic intensity of the system are given 
by (see formula (9.80)), 


7701 0100) for j=1,2,...,5;-1, 


9i(/) = 1 : 
oe p}9,(0) for j=5;,5;+1,..., 
sj! t 


Pi < Si, 


Si -l 
P; 
(s7—D)L7=py))| 


sj—l , 

“S 1 
p(0)=) 2 = pit Pi < Sj. 

j=0 J: 
(In the context of queueing networks, the notation @;(-) for the stationary state proba- 
bilities is common practice.) The stationary state probabilities of the queueing net- 


work are simply obtained by multiplying the corresponding state probabilities of the 
queuing systems M/M/s;/0, i= 1,2,...n: 


If the vector of the total input intensities a =(1,02,...,Qn) given by (9.106) 
satisfies the conditions a; <s8j;U;, i= 1,2,...,n, then the stationary probability 
of state X =(X1,X2,..-,Xn) is 


nx =I 9x), x € Z. (9.110) 


Thus, the stationary state distribution of a Jackson queueing system is given in prod- 
uct form. This implies that each node of the network behaves like an M/M/s ;/co-sys- 
tem. However, the nodes need not be queueing systems of this type because the 
process {X;,(4), t= 0} is usually not a birth and death process. In particular, the total 
input into a node need not be a homogeneous Poisson process. But the product form 
(9.110) of the stationary state probabilities proves that the queue lengths at the nodes 
in the steady state are independent random variables. There is a vast amount of litera- 
ture dealing with assumptions under which the stationary distribution of a queueing 
network has the product form (see, for instance, van Dijk (1983)). 


To verify that the stationary state distribution indeed has the product form (9.110), 
one has to substitute (9.110) into the system of equations (9.109). Using (9.105) and 
(9.106), one obtains an identity after some tedious algebra. 


Queueing Networks with Feedback The simplest Jackson queueing network arises if 
n=1. The only difference from the queueing system M//M/s/co is that now a positive 
proportion of customers, who have departed from the network after having been 
served, will return and require further service. This leads to a queueing system with 
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>| >| > 


S 


waiting capacity 


server 


Figure 9.17 Queueing system with feedback 


feedback (Figure 9.17). For instance, when servers have done a bad job, the affected 
customers will soon return to exercise possible guarantee claims. Formally, these 
customers remain in the network. Roughly speaking, a single-node Jackson queueing 
network is a mixture between an open and a closed waiting system. A customer leaves 
the system with probability a or reenters the system with probability p,; =1-a. If 
there is an idle server, then, clearly, the service of such a customer starts immediately. 
From (9.105) and (9.106), the total input rate a into the system satisfies 


a=A+a(1—-a). 
(The index 1 is deleted from all system parameters.) Thus, 
a=da. 


Hence there exists a stationary distribution if 
A/a < sp or, equivalently,if p=A/u<as. 


In this case the stationary state probabilities are 


J 
7) tq for j=1,2,...,5—-1, 
aa j 
! = (8) no for PHS SH lyass 
sls 


where 


Came 


Interestingly, this is the stationary state distribution of the queueing system M/M/s/co 
(without feedback), the input of which has intensity A/a. 
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Sequential Queueing Networks In technological processes, the sequence of service 
is usually fixed. For example, a 'customer' may be a car being manufactured on an 
assembly line. Therefore, queueing systems switched in series, called sequential 
queueing networks or tandem queueing networks, are of considerable practical 
interest: External customers arrive only at node | (arrival intensity: 1, ). They sub- 
sequently visit in this order the nodes 1, 2, ..., m and then leave the network. 


H=25( 1) ~ 2) SSS oh ee a 


Hi M2 Un 


Figure 9.18 Sequential queueing network 


The corresponding parameters are (Figure 9.18): 
A, =0; 1=2,3,..,n 
Pijst = 1; 1=1,2,..,n-1 


a, =a, =": =a,-1 =9, an=1. 


According to (9.106), the (total) input intensities of all nodes in the steady state must 
be the same: 


Ay =O, =A2="*: =On. 
Hence, for single-server nodes (s; = 1; i=1,2,...,), a stationary state distribution 
exists if 
pp =Ay/u; <1; i=1,2,...,0, 
or, equivalently, if 
Ay <min(t1, H2,..., bn). 


Thus, the slowest server determines the efficiency of a sequential network. The sta- 
tionary probability of state x = (x1,x2,...,Xn) is 


n Xx; 
mx=I] p;'-p)); xeZ. 
i=1 


The sequential network can be generalized by taking feedback into account. This is 
left as an exercise to the reader. Oo 


Example 9.20 Defective robots arrive at the admission's department of a mainte- 
nance workshop in accordance with a homogeneous Poisson process with intensity 
i. =0.2[h-!]. In the admissions department (denoted as (1)) a first failure diagnosis 
is done. Depending on the result, the robots will have to visit other departments of 
the workshop. These are departments for checking and repairing the mechanics (2), 
electronics (3), and software (4) of the robots, respectively. The failure diagnosis in 
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Figure 9.19 Maintenance workshop as a queueing network 


the admissions department results in 60% of the arriving robots being sent to depart- 
ment (2) and 20% each to the departments (3) and (4). After having being maintained 
in department (2), 60% of the robots leave the workshop, 30% are sent to department 
(3), and 10% to department (4). After having being served by department (3), 70% of 
the robots leave the workshop, 20% are sent to department (2), and 10% are sent to 
department (4). After elimination of possible software failures all robots leave the 
workshop. A robot can be sent several times to one and the same department. 


The following transition probabilities result from the transfer of robots between the 
departments: 


P12 = 9.6, py3 =0.2, py4 = 0.2, 
P23 = 0.3, pr4 = 0.1, 
p32 =9.2, p34 =0.1. 
The service intensities are assumed to be 
Wy =1, wy = 0.45, 3 = 0.4, wy =0.1 [A7!]. 


The graph plotted in Figure 9.19 illustrates the possible transitions between the 
departments. The edges of the graph are weighted by the corresponding transition 
probabilities. The system of equations (9.106) in the total input intensities is 


a, =0.2 
a2 =0.60, +0.203 
03 =0.20,4+0.3a9 
a4=0.20,4+0.1a24+0.103. 
The solution is (after rounding) 
a, =0.20, a2=0.135, a3=0.08, a4 =0.06. 
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The corresponding traffic intensities p; =a;/U; are 
pLi= 0.2, p2= 0.3, P3= 0.2, P4a= 0.6. 


From (9.110), the stationary probability of state x = (x1,%2,...,xn) for single-server 
nodes is 


Ae 
nx =TTj-) p**(1-p,) 
or my = 0.1792 (0.2)*1 (0.3)*2(0.2)*3 (0.6)*4; xe Z={0,1,...}4. 


In particular, the stationary probability that there is no robot in the workshop is 
Tx = 0.1792, 


where xg = (0,0,0,0). Let X; denote the random number of robots at node i in the 
steady state. Then the probability that, in the steady state, there is at least one robot 
in the admissions department is 
P(X, > 0) =0.8 © 2) (0.2)! = 0.2. 
Analogously 
P(X2 > 0) =0.3, P(X¥3>0)=0.2, and P(X4 > 0) = 0.6. 
Thus, when there is a delay in servicing defective robots, the cause is most probably 


department (4) in view of the comparatively high amount of time necessary for find- 
ing and removing software failures. oO 


9.7.6.3 Closed Queueing Networks 


Analogously to the closed queueing system, customers cannot enter a closed queue- 
ing network 'from outside’. Customers which have been served at a node do not leave 
the network, but move to another node for further service. Hence, the number of cus- 
tomers in a closed queueing network is a constant N. Practical examples for closed 
queueing networks are multiprogrammed computer and communication systems. 
When the service of a customer at node i is finished, then the customer moves with 
probability p;; to node j for further service. Since the customers do not leave the 
network, 

Se py a 1s tS 1,2,.240, (9.111) 
where as usual 7 is the number of nodes. Provided the discrete Markov chain given 
by the transition matrix P = ((p;;)) and the state space Z = (1, 2,...,n} is irreducible, 
it has a stationary state distribution {21,7,...,%n}, which, according to (8.9), is the 


unique solution of the system of equations 
= Vea a leat, (9.112) 
l= pea Tj. 
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Let X;,(f) be the random number of customers at node i at time ¢ and 
X(1) = (X19, X29), --»Xn(O)- 

The state space of the Markov chain {X(A), t= 0} is 

Z= {x =(x1,%2,..,Xn) with Dz, x;=N and O< x; < NI, (9.113) 
where the x; are nonnegative integers. The number of elements (states) in Z is 

@ +N-1 ) 
N : 
Let u; =u;(x;) be the service intensity of all servers at node i if there are x; custom- 
ers at this node, 1;(0) = 0. Then {X(¢), t= 0} has the positive transition rates 
Gx,x-ejte, =H) Ps X21, TF), 
qx-ejte;,x =Hjaj+ pis i#j, x-e;+e; € Z, 
where the e; are given by (9.108). From (9.111), the rate of leaving state x is 
qx = Die Hie - pi). 


Hence, according to (9.28), the stationary distribution {mx,x € Z} of the Markov 
chain {X(A), t= 0} satisfies 


n n 
Lue) -ptx= Ue wyt I pjitxe;te;, (9.114) 
i=1 ij=1i4j 


where x = (x1,2,...,Xn) € Z. In these equations, all 1x-e;+e, with x—e; +e; ¢ Z are 
equal to 0. Let @;(0) = 1 and 


J TC; 
(j - {1 J ); i=1,2,...,2; 7=1,2,...,N. 
0M = FLO Jj 


Then the stationary probability of state x = (%1,x2,...,xn) € Z. is 
n n et 
tx=h [1o(x;)), h=| X Ilo) (9.115) 
i=l yeZ i=l 


with y = (11,V2,.--,Vn). By substituting (9.115) into (9.114) one readily verifies that 
{mx , x € Z} is indeed a stationary distribution of the Markov chain {X(A), t= 0}. 


Example 9.21 Consider a closed sequential queueing network, which has a single 
server at each of its n nodes (Figure 9.20). There is only N= 1 customer in the sys- 
tem. When this customer is being served at a certain node, the other nodes are empty. 
Hence, with vectors e; as defined by (9.108), the state space of the corresponding 
Markov chain {X(A), f= 0} is Z= {e],€9,...,e}. The transition probabilities are 


Piist = 1; (=1,2,..,n-1; pay =1. 
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Figure 9.20 Closed sequential queueing network 


The corresponding solution of (9.114) is a uniform distribution 
TY ==" =n = L/n. 
Let p1; = U;(1) be the service rate at node i. Then, for i= 1,2,...,7, 
1 fe. aly | 
PMO=1, OD =A b= nf Di | 


Hence, the stationary state probabilities (9.115) are 


I/u; 
Te, => Bi > 7=1,2,...,n. 
a. 
i=l Hi 
In particular, if u; =p; i=1,2,...,n, then the states e; have a uniform distribution: 


Te, =1/n; i=1,2,...,n. 


If there are N= 1 customers in the system and the y; do not depend on x;, then the 
stationary state probabilities are 


fs (1/1/22 + ny 
x n 1 Ji 
> I(t) 
yeZ i=1 
where x = (x1,%2,...,Xn) € Z Given u;=p, (=1,2,...,n, the states have again a 
uniform distribution: 


"°° GENET)’ x eZ. Oo 
N 


Example 9.22 A computer system consists of two central processors 2 and 3, a disc 
drive 1, and a printer 4. A new program starts in the central processor 2. When this 
processor has finished its computing job, the computing phase continues in central 
processor 3 with probability a or the program goes to the disc drive with probability 
1-—a. From the disc drive the program goes to central processor 3 with probability 1. 
From central processor 3 it goes to the central processor 2 with probability B or to the 
printer with probability 1 —£. Here it terminates or goes back to central processor 2. 
When a program terminates, then another program (from outside) immediately joins 
the queue of central processor 2 so that there is always a fixed number of programs 
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Figure 9.21 Computer system as a closed queueing network 


in the system. Hence, a program formally goes from the printer to the central proces- 
sor 2 with probability 1. If NV denotes the constant number of programs in the system, 
this situation represents a simple case of multiprogramming with N as the level of 
multiprogramming. The state space Z of this system and the matrix P of the transi- 
tion probabilities p;; are 


Z= {y =(¥1,2,43,¥4)s Wi) =9,1,...,M5 vi ty2ty3+y4=N} 


and 
f o 01 0 \ 
P= l-a 0 a 0O 
0 £p 0 1-8 
0 10 0 
respectively (Figure 9.21). The corresponding solution of (9.114) is 
l-a 1 Lp 
tl Aaep: aes re Me Aspe 


Let the service intensities of the nodes 11, 12, 43, and uy be independent of the 
number of programs at the nodes. Then, 


1; x . 
oi =() be in: 


Hence, the stationary probability of state 
X= (X%1,%2,X3,X4) with x} +x24+x3+x, =N 


6 oe EE 
with h= ee a HE 
iG) Ga) Ge) 


Application-oriented treatments of queueing networks are Gelenbe, Pujolle (1987), 
Watlrand (1988). 
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5.8 SEMI-MARKOV CHAINS 


Transitions between the states of a continuous-time homogeneous Markov chain are 
controlled by its transition probabilities. According to section 9.4, the sojourn time in 
a state has an exponential distribution and depends on the current state, but not on 
the history of the process. Since in most applications the sojourn times in system 
states are non-exponential random variables, an obvious generalization is to allow 
arbitrarily distributed sojourn times whilst retaining the transition mechanism between 
the states. This approach leads to semi-Markov chains. 


A semi-Markov chain with state space Z = {0,1,...} evolves in the following way: 
Transitions between the states are governed by a discrete-time homogeneous Mar- 
kov chain {X,X1,...} with state space Z and matrix of transition probabilities 


P=((pi;)). 


If the process starts at time ¢=0 in state ig, then the subsequent state 7, is determin- 
ed according to the transition matrix P, while the process stays in state ig a random 
time Y;,;,. After that the state i, following state i,, is determined. The process stays 
in state 7; a random time Y;,;, and so on. The random variables Y;; are the condi- 
tional sojourn times of the process in state 7 given that the process makes a transition 
from i to 7. They are assumed to be independent. Hence, immediately after entering a 
state at a time ¢, the further evolvement of a semi-Markov chain depends only on its 
state at this time point, but not on the evolvement of the process before t. The sample 
paths of a semi-Markov chain are piecewise constant functions which, by convention, 
are continuous on the right. In contrast to homogeneous continuous-time Markov 
chains, for predicting the development of a semi-Markov chain from a time point ¢, it 
is not only necessary to know its current state i, but also the 'age' of i at time ¢. 

Let 79,71,... denote the sequence of time points at which the semi-Markov chain 


makes a transition from one state to another (or to the same state). Then 
Xn =X(Tn); n=0,1,..., (9.116) 
where Xq = X(0) is the initial state (Xn = X(Tn+0)). Hence, the transition probabi- 
lities can be written in the following form 
Pig =PXT ns) =J/XTn) =); 2 =0,1,.... 
In view of (9.116), the discrete-time stochastic process {Xq,X1,...} 1s embedded in 
the (continuous-time) semi-Markov chain {X(4), t= 0} (see page 401). 


As already pointed out, the future development of a semi-Markov chain from a jump 
point Tn is independent of the entire history of the process before 7. Let 


Fi) =PV; Ss), Lj € Z, 


denote the distribution function of the conditional sojourn time Y;; of a semi- Markov 
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chain in state 7 if the subsequent state is 7. By the total probability rule, the uncondi- 
tional sojourn time Y; of the chain in state i is 


Fi) =P; S$) = Liez Pij Fi, ie Z. (9.117) 


Special Cases 1) An alternating renewal process (page 319) is a semi-Markov chain 
with state space Z = {0,1} and transition probabilities 

Poo =P11 =9 and poi =Pi9= 1. 
The states 0 and | indicate that the system is under renewal or operating, respectively. 
In this case, F'91(-) and Fjo9(-) are in this order the distribution functions of the re- 
newal time and the system lifetime. 


2) A homogeneous Markov chain in continuous time with state space Z = {0,1,...} is 
a semi-Markov chain with the same state space and transition probabilities (9.34): 
Gif eA 3 
Pap Ge a Reds 
where qj; (qj) are the conditional (unconditional) transition rates of the Markov 
chain. By (9.31), the distribution function of the unconditional sojourn time in state i 
is 


F(t) =1-e4t, 120. 


In what follows, semi-Markov processes are considered under the following three 
assumptions: 


1) The embedded homogeneous Markov chain {X0,Xj,...} has a unique stationary 
state distribution {70,71,....}. By (8.9), this distribution is solution of 


ea eee eae (9.118) 
ieZ ieZ 


As pointed out in section 8.3, a unique stationary state distribution exists if the Mar- 
kov chain is aperiodic, irreducible, and positive recurrent. 


2) The distribution functions F(t) = P(Y; <4) are nonarithmetic (see definition 5.3, 
page 216). 


3) The mean sojourn times of the process in all states are finite: 


b= EY) =JoU-FiO)dt<o, ie Z. 
Note: In this section 1; denotes no longer an intensity, but a mean sojourn time. 


In what follows, a transition of the semi-Markov chain into state k is called a k-tran- 
sition. Let N;(t) be the random number of k-transitions occurring in (0, ¢] and H;() 


its mean value: H;,(t) = E(N;(f)). Then, for any ¢ > 0, 


: TH, 
lim [H7;,(¢+ t) —A,.()] = =——_,, k € Z. 9.119 
Jim [y(0+2) Hy] = 5 (9.119) 
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This relationship implies that after a sufficiently long time period the number of 
k-transitions in a given time interval does no longer depend on the position of this 
interval, but only on its length. Strictly speaking, the right-hand side of (9.119) gives 
the mean number of k-transitions in an interval of length t once the process has 
reached its stationary regime, or, with other words, if it is in the steady state. The fol- 
lowing formulas and the analysis of examples is based on (9.119), but the definition 
and properties of stationary semi-Markov chains will not be discussed in detail. 


From (9.119), when the process is in the steady state, the mean number of k-transi- 
tions per unit time is 

a 

Diez Mihi 


Hence the portion of time the chain is in state k is 


Up= 


Nk Wk 
A, ==... 9.120 
2 Liz Ti Ui ( ) 


Consequently, in the longrun, the fraction of time the chain is in a set of states Zo, 
Zo CZ, is 
— LkeZy Mek 


A : 
A Diez Ti; 


(9.121) 


With other words, Az, is the probability that a visitor, who arrives at a random time 

from ‘outside’, finds the semi-Markov chain in a state belonging to Zo. 

Let c, denote the cost, which is caused by a k-transition of the system. Then the 

mean total (transition) cost per unit time is 

_ Dkez MkCK 
Diez Tibi 

Note that the formulas (9.119) to (9.122) depend only on the unconditional sojourn 

times of a semi-Markov chain in its states. This property facilitates their application. 


G (9.122) 


Example 9.23 (age renewal policy) The system is renewed upon failure by an 
emergency renewal or at age t by a preventive renewal, whichever occurs first. 


To determine the stationary system availability, system states have to be introduced: 


0 operating 
1 emergency renewal 
2 preventive renewal 


F(t) F(t) 
OCnaOm 


Figure 9.22 Transition graph for example 9.23 
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Let L be the random system lifetime, F(t) = P(Z < £) its distribution function, and 
F(t)=1-F(t)=P(L> 2?) 
its survival probability. Then the positive transition probabilities between the states 
are (Figure 9.22) 
Por =F), por=FO), pio =P20=1. 


Let Ze and Z, be the random times for emergency renewals and preventive renewals, 
respectively. Then the conditional sojourn times of the system in the states are 


Yo. =L, Yor=t, Yio=Ze, Yoo =Zp. 


The unconditional sojourn times are 
Yo=min(Z,t), Yj =Ze, Y2=Zp. 


The system behaviour can be described by a semi-Markov chain {X(f), ¢= 0} with 
state space Z = {0, 1,2} and the transition probabilities and sojourn times given. The 
corresponding equations (9.118) in the stationary probabilities of the embedded Mar- 
kov chain are 


To = TM, +12 
m1 = F(t) To 
1 = To +My, +. 


The solution is 
mo = 1/2, my = F(t)/2, mo = F(t)/2. 
The mean sojourn times are 
Mo =JoFWdt, wy =de, Wy =dp. 
According to (9.120), the stationary availability 49 = A(t) of the system is 
070 
A(c) = Homo 
Homo +H1M] + HQT 

or 
Jj F@at 


A(t) SE ag 
Jp FW dtt de F(t) + dp F(t) 


(9.123) 


It is important that this result does not depend on the probability distributions of the 
renewal times Ze and Z,, but only on their mean values. An optimal renewal interval 
T=T* satisfies the equation dA(t)/dt = 0 or 


M(t) [° F() dt— F(t) = 4s (9.124) 


with d=de/dp. A unique solution of this equation exists if A(A) is strictly increasing 
and dp < de, i.e. d< 1. (Otherwise preventive renewals would not make sense.) 
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By coupling the equations (9.123) and (9.124) the corresponding maximal long-run 
availability A(t”) is seen to have structure 


A(t*) 


_ 1 
~ 1+ (de—dp) A(t") Cee) 


As a numerical special case, let L have a Rayleigh-distribution with parameter 0 and 
renewal times de = 10 and dp = 2. Then 


F(t) = P(L<t)=1-e"” ¢>0, 
and, by formula (2.80), page 77, L has mean value 
E(L) =0/ 7/4 . 


Since the corresponding failure rate is X(t) = 21/07, equation (9.123) becomes 
7/0 5 ; 
Dyer dee S125: 
0 


The unique solution is t* = 0.5107- 8. This holds for any 0. (0 is a scale parameter.) 
By (9.125), the maximal stationary availability is 
go fe. 

0+8.1712’ 


whereas the stationary availability of the system without preventive renewals is smal- 
ler: 


A(t*) 


E(L) 8 


~ E(L)+de  04+11.2838" 


If the renewal times are negligibly small, but the mean costs ce and cp for emergen- 
cy and preventive renewals, respectively, are relevant, then, from (9.122), the mean 
renewal cost per unit time in the steady state are 
Cel, +p. Cef(t)+ CpF(t) 

Hom Fat 


K(= 


Analogously to the corresponding renewal times, ce and cp can be thought of mean 
values of arbitrarily distributed renewal costs. Since K(t) has the same functional 
structure as 1/A(t) — 1, maximizing A(t) and minimizing K(t) leads again to the same 
equation (9.124) if there d is replaced with c = cp/ce. O 


Example 9.24 A series system consists of m subsystems e1, e2,...,@n. The lifetimes 
of the subsystems L1, L,..., Ln are independent exponential random variables with 
parameters 11,A2,...,An. Let 

G(t)=P(L;<)=1-e"', gf(t)=aj;e%', +20; i=1,2,...,n. 


When a subsystem fails, the system interrupts its work. As soon as the renewal of the 
failed subsystem is finished, the system continues operating. Let ,; be the average 
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renewal time of subsystem e;. As long as a subsystem is being renewed, the other 
subsystems cannot fail, i.e. during such a time period they are in the cold-standby 
mode. The following system states are introduced: 


X(t)=0 if the system is operating, 
X(t) =i if subsystem e; is under renewal, i = 1,2,...,n. 


Then {X(t),¢20} is a semi-Markov chain with state space Z= {0,1,...,.n}. The 
conditional sojourn times in state 0 of this semi-Markov chain are 


Yo; =L;, i= 1,2,...,n, 
and its unconditional sojourn time in state 0 is 
Yo =min{Z1,Lo,...,Ln}. 
Thus, Yo has distribution function 
Fo(t)=1- Gi) Ga Gn. 
Letting A=A, +A2+-+: +An implies 
Fo(t)=1-—e*4, t>0, 
Uo =E(Yo) =I. 
The system makes a transition from state 0 into state i with probability 
Poi =P(%0 = Li) 


= Jp Gils) - Gr(x)-- G 1) + G x1) Gala) g(x) dx 


=Jo e athate thea thier thn )% 9, oMa® dy =| ej dr. 
Hence, 
Wy 
Po=Z Pi0= 1s 1=1,2,...50. 


Thus, the system of equations (9.118) becomes 


To =My+Mat+++++TMn, 


hes 
m= >To; i=1,2,..,n. 
In view of 11 +22 +---+2n=1-M79, the solution is 
x, 
m= 53 Ri a i=1,2,...,n. 


With these ingredients, formula (9.120) yields the stationary system availability 


1 


a rr 
14+ Der Abi 
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Example 9.25 Consider the loss system M/G/1/0 on condition that the server is sub- 
jected to failures: Customers arrive according to a homogeneous Poisson process 
with rate 2. Hence, their interarrival times are identically distributed as an exponen- 
tial random variable Y with parameter 2. The server has random lifetime Ly when 
being idle, and a random lifetime ZL; when being busy. Lo is exponential with param- 
eter Ag, and L, is exponential with parameter 1,. The service time Z has distribution 
function B(t) with density b(t). When at the time point of server failure a customer 
is being served, then this customer is lost, i.¢., it has to leave the system. All occur- 
ring random variables are assumed to be independent. To describe the behavior of 
this system by a semi-Markov chain, three states are introduced: 


State 0 The server is idle, but available. 
State | The server is busy. 
State 2 The server is under repair (not available) 


To determine the steady state probabilities of the states 0, 1, and 2, the transition prob- 
abilities are needed: 


P00 =P11 =P22 =P21 =9, p20 =1 


nN 
A+Ag 


Pol =P(Lo > Y) = le e holy ett = 


do 
Poe po eS = a, 


Pi =P(L1 > Z)=J5 eb@adt 
Pi2=1-piw=P(L1 SZ) =[jl-e* ood. 


With these transition probabilities, the stationary state probabilities of the embedded 
Markov chain {X0, X1,...} can be obtained from (9.118): 


A+ q. ho tAP 12 
“JOVEN ERD DOA OFA. 


TO 


The sojourn times in state 0, 1, and 2 are 
Yo =min (Lo, ¥), Yy = min (L1,2), Yo =Z, 
Hence, the mean sojourn times are 
yee oe a Pare Ait _ 
Ho= Tapp? Malo -BOE dt, r= EZ). 


With these parameters, the stationary state probabilities of the semi-Markov process 
are given by (9.120). Oo 


The time-dependent behaviour of semi-Markov chains is discussed, for instance, in 
Kulkarni (2010). 
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9.9 EXERCISES 


9.1) Let Z = {0, 1} be the state space and 


et l-e 
P= 
Cc) l-et et 


the transition matrix of a continuous-time stochastic process {X(t), t>0}. Check 
whether {X(d), £20} is a homogeneous Markov chain. 


9.2) A system fails after a random lifetime Z. Then it waits a random time W for 
renewal. A renewal takes another random time Z. The random variables L, W, and Z 
have exponential distributions with parameters 1, v, and p, respectively. On comple- 
tion of a renewal, the system immediately resumes its work. This process continues 
indefinitely. All life, waiting, and renewal times are assumed to be independent. Let 
the system be in states 0, 1, and 2 when it is operating, waiting, or being renewed. 
The transitions between the states are governed by a Markov chain {X(¢), t= 0}. 


(1) Draw the transition graph of {X(4), t= 0} and set up a system of linear differential 
equations for the time-dependent state probabilities p;(t) = P(X() =i), i= 0,1, 2. 

(2) Use this system to derive an algebraic system of equations for the stationary state 
probabilities 7; of {X(, t= 0}. Determine the stationary availability of the system. 


9.3) Consider a 1-out-of-2 system, i.e., the system is operating when at least one of 
its two subsystems is operating. When a subsystem fails, the other one continues to 
work. On its failure, the joint renewal of both subsystems begins. On its completion, 
both subsystems resume their work at the same time. The lifetimes of the subsystems 
are identically exponential with parameter 4. The joint renewal time is exponential 
with parameter p. All life- and renewal times are independent of each other. Let X(d) 
be the number of subsystems operating at time f. 


(1) Draw the transition graph of the Markov chain {X(0), t = 0}. 

(2) Given the initial condition P(X(0) = 2)=1, determine the time-dependent state 
probabilities p,(t) = P(X( =i), i=0,1,2, and the stationary state distribution. 

Hint Consider separately the cases (A+ p+ v)(=\(<)(>) 4p + Av + pv). 


9.4) A copy center has 10 copy machines of the same type which are in constant use. 
The times between two successive failures of a machine have an exponential distribu- 
tion with mean value 100 hours. There are two mechanics who repair failed machines. 
A defective machine is repaired by only one mechanic. During this time, the second 
mechanic is busy repairing another failed machine, if there are any, or this mechanic 
is idle. All repair times have an exponential distribution with mean value 4hours. All 
random variables involved are independent. Consider the steady state. 


(1) What is the average percentage of operating machines? 
(2) What is the average percentage of idle mechanics? 
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9.5) Consider the two-unit system with standby redundancy discussed in example 
9.5a) on condition that the lifetimes of the units are exponential with respective 
parameters 4; and A7. The other model assumptions listed in example 9.5 remain 
valid. 


Model the system by a Markov chain and draw the transition graph. 


9.6) Consider the two-unit system with parallel redundancy discussed in example 9.6 
on condition that the lifetimes of the units are exponential with parameters 1, and 
2, respectively. The other model assumptions listed in example 9.6 remain valid. 


Model the behavior of the system by a Markov chain and draw the transition graph. 


9.7) The system considered in example 9.7 is generalized as follows: If the system 
makes a direct transition from state 0 to the blocking state 2, then the subsequent 
renewal time is exponential with parameter p19. If the system makes a transition from 
state 1 to state 2, then the subsequent renewal time is exponential with parameter [1]. 
(1) Model the system by a Markov chain and draw the transition graph. 


(2) What is the stationary probability that the system is blocked? 


9.8) Consider a two-unit system with standby redundancy and one mechanic. All 
repair times of failed units have an Erlang distribution with parameters n = 2 and u. 
Apart from this, the other model assumptions listed in example 9.5 remain valid. 

(1) Model the system by a Markov chain and draw the transition graph. 

(2) Determine the stationary state probabilities of the system. 

(3) Sketch the stationary availability of the system as a function of p = A/u. 


9.9) Consider a two-unit parallel system (i.e., the system operates if at least one unit 
is operating). The lifetimes of the units have an exponential distributions with param- 
eter A. There is one repairman, who can only attend one failed unit at a time. Repairs 
times have an Erlang distribution with parameters n=2 and A= 1/2. The system 
arrives at the failed state as soon as a unit fails during the repair of the other one. All 
life and repair times are assumed to be independent. 

(1) By using Erlang's phase method, determine the relevant state space of the system 
and draw the corresponding transition graph of the underlying Markov chain. 


(2) Determine the stationary availability of the system. 
9.10) When being in states 0, 1, and 2, a (pure) birth process {X(4), ¢ = 0} with state 
space Z = {0,1,2,...} has the respective birth rates 

Ag =2, A, =3, AQ= 1. 


Given X(0) = 0, determine the time-dependent state probabilities p;(4) = P(X(4) =i) 
for i=0,1,2. 
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9.11) Consider a linear birth process with state space Z = {0,1,2,...} and transition 
ratesA;=jA, j=0,1,... 


(1) Given X(0) = 1, determine the distribution function of the random time point 73 
at which the process enters state 3. 

(2) Given X(0) = 1, determine the mean value of the random time point 7, at which 
the process enters state n, n> 1. 


9.12) The number of physical particles of a particular type in a closed container 
evolves as follows: There is one particle at time ¢= 0. Its splits into two particles of 
the same type after an exponential random time Y with parameter A (its lifetime). 
These two particles behave in the same way as the original one, 1.e., after random 
times, which are identically distributed as Y, they split into 2 particles each, and so 
on. All lifetimes of the particles are assumed to be independent. Let X(¢) denote the 
number of particles in the container at time ¢. 


Determine the absolute state probabilities p;(t) = P(X() =/); j=1,2,..., of the sto- 
chastic process {X(f), t= 0}. 


9.13) A death process with state space Z = {0,1,2,...} has death rates 
Ho =0, wy =2, and pz =y3 = 1. 
Given X(0) = 3, determine p(t) = P(X() =/) for j =0,1,2,3. 


9.14) A linear death process { X(t), t2 0} has death rates u;=jp; j=0,1,.... 


(1) Given X(0) =2, determine the distribution function of the time to entering state 0 
(‘lifetime' of the process). 

(2) Given X(0) =n, n> 1, determine the mean value of the time at which the process 
enters state 0. 


9.15) At time t=0 there are an infinite number of molecules of type a and 2n 
molecules of type 5 in a two-component gas mixture. After an exponential random 
time with parameter 1 any molecule of type b combines, independently of the others, 
with a molecule of type a to form a molecule ab. 


(1) What is the probability that at time ¢ there are still 7 free molecules of type 5 in 
the container? 

(2) What is the mean time till there are only n free molecules of type 5 left in the 
container? 


9.16) At time t= 0 acable consists of 5 identical, intact wires. The cable is subject to 
a constant load of 100kp such that in the beginning each wire bears a load of 20kp. 
Given a load of w kp per wire, the time to breakage of a wire (its lifetime) is expo- 
nential with mean value 
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1000 


w_ Lweeks]. 


When one or more wires are broken, the load of 100kp is uniformly distributed over 
the remaining intact ones. For any fixed number of wires, their lifetimes are assumed 
to be independent and identically distributed. 


(1) What is the probability that all wires are broken at time t = 50 [weeks]? 
(2) What is the mean time until the cable breaks completely? 


9.17)* Let {X(), 20} be a death process with X(0)=7n and positive death rates 
H1, H2, --5 Hn. 


Prove: If Y is an exponential random variable with parameter A and independent of 
the death process, then 


-oy=-T, He 
POX) = 0) = I, 


9.18) A birth- and death process has state space Z = {0, 1,...,1} and transition rates 
Aj=(n-s)X and py=ju; j=0,1,...,7. 


Determine its stationary state probabilities. 


9.19) Check whether or under what restrictions a birth- and death process with tran- 
sition rates 


itl = 
N= ag and Hy =p; jJ=9,1,..., 
has a stationary state distribution. 


9.20) A birth- and death process has transition rates 
A=G+ DA and p= j7p; f=0,1,..5 O< A <p 


Confirm that this process has a stationary state distribution and determine it. 


9.21) Consider the following deterministic models for the mean (average) develop- 
ment of the size of populations: 


(1) Let m(A) be the mean number of individuals of a population at time ¢. It is reason- 
able to assume that a change of the population size, namely dm(t)/dt, is proportional 
to m(t), 2 0,1.e., for a constant / the mean number m(f) satisfies the differential 
equation 


dm(t) _ 
7 =hmi(t). 


a) Solve this differential equation assuming m(0) = 1. 


b) Is there a birth and death process the trend function of which has the functional 
structure of m(t)? 


CONTINUOUS-TIME MARKOV CHAINS 469 


(2) The mean population size m(f) satisfies the differential equation 


ao =-um(t). 


a) With a positive integer N, solve this equation under the initial condition 
m(0)=N. 


b) Is there a birth and death process the trend function of which has the functional 
structure of m(t)? 


9.22) A computer is connected to three terminals (for example, measuring devices). 
It can simultaneously evaluate data records from only two terminals. When the 
computer is processing two data records and in the meantime another data record has 
been produced, then this new data record has to wait in a buffer, when the buffer is 
empty. Otherwise the new data record is lost. The buffer can store only one data 
record. The data records are processed according to the FCFS-queueing discipline. 
The terminals produce data records independently according to a homogeneous 
Poisson process with intensity 4. The processing times of data records from all 
terminals are independent, even if the computer is busy with two data records at 

the same time, and they have an exponential distribution with parameter p. They are 
assumed to be independent of the input. 


Let X(¢) be the number of data records in computer and buffer at time ¢. 
(1) Verify that {X(#, t= 0} is a birth and death process, determine its transition rates 
and draw the transition graph. 


(2) Determine the stationary loss probability, i.e., the probability that in the steady 
state a data record is lost. 


9.23) Under otherwise the same assumptions as in exercise 9.22, it is assumed that a 
data record, which has been waiting in the buffer a random patience time, will be 
deleted as being no longer up to date. The patience times of all data records are 
assumed to be independent, exponential random variables with parameter v. They 
are also independent of all arrival and processing times of the data records. 

(1) Draw the transition graph. 


(2) Determine the stationary loss probability. 


9.24) Under otherwise the same assumptions as in exercise 9.22, it is assumed that a 
data record will be deleted when its total sojourn time in the buffer and computer 
exceeds a random time Z, where Z has an exponential distribution with parameter a. 
Thus, the interruption of the current service of a data record is possible. 

(1) Draw the corresponding transition graph. 


(2) Determine the stationary loss probability. 
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9.25) A small filling station in a rural area provides diesel for agricultural machines. 
It has one diesel pump and waiting capacity for 5 machines. On average, 8 machines 
per hour arrive for diesel. An arriving machine immediately leaves the station without 
fuel if pump and all waiting places are occupied. The mean time a machine occupies 
the pump is 5 minutes. The station behaves like a M/M/s/m-queueing system. 


(1) Determine the stationary loss probability. 
(2) Determine the stationary probability that an arriving machine waits for diesel. 


9.26) Consider a two-server loss system. Customers arrive according to a homogene- 
ous Poisson process with intensity 2. A customer is always served by server | when 
this server is idle, i.e., an arriving customer goes only then to server 2, when server | 
is busy. The service times of both servers are iid exponential random variables with 
parameter pt. Let X(t) be the number of customers in the system at time ¢. 


Determine the stationary state probabilities of the stochastic process {X(d), t = 0}. 


9.27) A two-server loss system is subject to a homogeneous Poisson input with in- 
tensity 2. The situation considered in exercise 9.26 is generalized as follows: If both 
servers are idle, a customer goes to server | with probability p and to server 2 with 
probability 1 —p. Otherwise, a customer goes to the idle server (if there is any). The 
service times of the servers | and 2 are independent, exponential random variables 
with parameters 1; and p12, respectively. Arrival and service times are independent. 


Describe the behaviour of the system by a suitable homogeneous Markov chain and 
draw the transition graph. 


9.28) A single-server waiting system is subject to a homogeneous Poisson input with 
intensity 4 =30[A7!]. If there are not more than 3 customers in the system, the ser- 
vice times have an exponential distribution with mean 1/u = 2 [min]. If there are more 
than 3 customers in the system, the service times are exponential with mean 1/u = 1 
[min]. All arrival and service times are independent. 

(1) Show that there exists a stationary state distribution and determine it. 

(2) Determine the mean length of the waiting queue in the steady state. 


9.29) Taxis and customers arrive at a taxi rank in accordance with two independent 
homogeneous Poisson processes with intensities 


Ay =4[h1] and A. =3[h7}], 
respectively. Potential customers, who find 2 waiting customers, do not wait for ser- 
vice, but leave the rank immediately. Groups of customers, who will use the same 


taxi, are considered to be one customer. On the other hand, arriving taxis, who find 
two taxis waiting, leave the rank as well. 


What is the average number of customers waiting at the rank? 
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9.30) A transport company has 4 trucks of the same type. There are 2 maintenance 
teams for repairing the trucks after a failure. Each team can repair only one truck at a 
time and each failed truck is handled by only one team. The times between failures 
of a truck (lifetime) is exponential with parameter A. The repair times are exponen- 
tial with parameter p. All life and repair times are assumed to be independent. Let 
p=A/u=0.2. What is the most efficient way of organizing the work: 


(1) to make both maintenance teams responsible for the maintenance of all 4 trucks 
so that any team which is free can repair any failed truck, or 


(2) to assign 2 definite trucks to each team? 


9.31) Ferry boats and customers arrive at a ferry station in accordance with two inde- 
pendent homogeneous Poisson processes with intensities A and uw, respectively. If 
there are k customers at the ferry station, when a boat arrives, then it departs with 
min (k,n) passengers (n is the capacity of each boat). If k>n, then the remaining 
k—n customers wait for the next boat. The sojourn times of the boats at the station 
are assumed to be negligibly small. 


Model the situation by a suitable homogeneous Markov chain {X(A), t = 0} and draw 
the transition graph. 


9.32) The life cycle of an organism is controlled by shocks (e.g., accidents, virus 
attacks) in the following way: A healthy organism has an exponential lifetime L with 
parameter 4,,. Ifa shock occurs, the organism falls sick and, when being in this state, 
its (residual) lifetime S is exponential with parameter 


As, As > Ap. 


However, a sick organism may recover and return to the healthy state. This occurs in 
an exponential time R with parameter u. If during a period of sickness another shock 
occurs, the organism cannot recover and will die a random time D after the occur- 
rence of the second shock. D is assumed to be exponential with parameter 


Nad Xd > Xs. 
The random variables L, S, R, and D are assumed to be independent. 


(1) Describe the evolvement in time of the states the organism may be in by a Markov 
chain. 


(2) Determine the mean lifetime of the organism. 


9.33) Customers arrive at a waiting system of type M/M/1/co with intensity 1. As 
long as there are less than n customers in the system, the server remains idle. As soon 
as the nth customer arrives, the server resumes its work and stops working only then, 
when all customers (including newcomers) have been served. After that the server 
again waits until the waiting queue has reached length n and so on. Let 1/u be the 
mean service time of a customer and X(t) be the number of customers in the system 
at time f. 
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(1) Draw the transition graph of the Markov chain {X(0), t = 0}. 
(2) Given that n = 2, compute the stationary state probabilities. Make sure they exist. 


9.34) At time ¢=0 a computer system consists of m operating computers. As soon as 
a computer fails, it is separated from the system by an automatic switching device 
with probability 1—p. If a failed computer is not separated from the system (this 
happens with probability p), then the entire system fails. The lifetimes of the comput- 
ers are independent and have an exponential distribution with parameter A. Thus, this 
distribution does not depend on the system state. Provided the switching device has 
operated properly when required, the system is available as long as there is at least 
one computer available. Let X(4) be the number of computers which are available at 
time ¢. By convention, if, due to the switching device, the entire system has failed in 
[0, 4), then X(¢) = 0. 


(1) Draw the transition graph of the Markov chain {X(d), t = 0}. 
(2) Given n =2, determine the mean lifetime E(Xs) of the system. 


9.35) A waiting-loss system of type M/M/1/2 is subject to two independent Poisson 
inputs | and 2 with respective intensities 4; and 22, which are referred to as type 1- 
and type 2-customers. An arriving type 1-customer who finds the server busy and the 
waiting places occupied displaces a possible type 2-customer from its waiting place 
(such a type 2-customer is lost), but ongoing service of a type 2-customer is not 
interrupted. When a type 1-customer and a type 2-customer are waiting, then the type 
1-customer will always be served first, regardless of the order of their arrivals. The 
service times of type 1- and type 2-customers are independent and have exponential 
distributions with respective parameters 1; , and [p. 


Describe the behavior of the system by a homogeneous Markov chain, determine the 
transition rates, and draw the transition graph. 


9.36) A queueing network consists of two servers | and 2 in series. Server | is subject 
to a homogeneous Poisson input with intensity 4=5 an hour. A customer is lost if 
server | is busy. From server | a customer goes to server 2 for further service. If ser- 
ver 2 is busy, the customer is lost. The service times of servers 1 and 2 are exponen- 
tial with respective mean values 


1/u, =6min and 1/2 = 12 min. 
All arrival and service times are independent. 


What percentage of customers (with respect to the total input at server 1) is served by 
both servers? 


9.37) A queueing network consists of three nodes (queueing systems) 1, 2, and 3, 
each of type M/M/1/co. The external inputs into the nodes have respective intensities 


A, =4, Az =8, and A3 = 12 [customers per hour]. 
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The respective mean service times at the nodes are 

4, 2, and | [min]. 
After having been served by node 1, a customer goes to nodes 2 or 3 with equal 
probabilities 0.4 or leaves the system with probability 0.2. From node 2, a customer 
goes to node 3 with probability 0.9 or leaves the system with probability 0.1. From 
node 3, a customer goes to node | with probability 0.2 or leaves the system with 
probability 0.8. The external inputs and the service times are independent. 


(1) Check whether this queueing network is a Jackson network. 
(2) Determine the stationary state probabilities of the network. 


9.38) A closed queueing network consists of 3 nodes. Each one has 2 servers. There 
are 2 customers in the network. After having been served at a node, a customer goes 
to one of the others with equal probability. All service times are independent random 
variables and have an exponential distribution with parameter pL. 


What is the stationary probability to find both customers at the same node? 


9.39) Depending on demand, a conveyor belt operates at 3 different speed levels 1, 2, 
and 3. A transition from level 7 to level 7 is made with probability p;; with 


P12 =9.8, pi3=0.2, p21 =p23 =9.5, p3,; =0.4, p32 =0.6. 


The respective mean times the conveyor belt operates at levels 1, 2, or 3 between 
transitions are 


Wy =45, Uo =30, and u3 = 12 [hours]. 


Determine the stationary percentages of time in which the conveyor belt operates at 
levels 1, 2, and 3 by modeling the situation as a semi-Markov chain. 


9.40) The mean lifetime of a system is 620 hours. There are two failure types: Repair- 
ing the system after a type 1-failure requires 20 hours on average and after a type 
2-failure 40 hours on average. 20% of all failures are type 2-failures. There is no 
dependence between the system lifetime and the subsequent failure type. Upon each 
repair the system is 'as good as new’. The repaired system immediately resumes its 
work. This process is continued indefinitely. Life- and repair times are independent. 


(1) Describe the situation by a semi-Markov chain with 3 states and draw the transi- 
tion graph of the underlying discrete-time Markov chain. 
(2) Determine the stationary state probabilities of the system. 


9.41)* Under otherwise the same model assumptions as in example 9.25, determine 
the stationary probabilities of the states 0, 1, and 2 introduced there on condition that 
the service time B is a constant u; 1.e., determine the stationary state probabilities of 
the loss system M/D/1/0 with unreliable server. 
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9.42) A system has two different failure types: type 1 and type 2. After a type i-fail- 
ure the system is said to be in failure state i; i= 1,2. The time ZL; to a type i-failure 
has an exponential distribution with parameter 1;; i= 1,2. Thus, if at time t=0 a 
new system starts working, the time to its first failure is 


Yo = min (L1,L2). 


The random variables L; and Ly are assumed to be independent. After a type 1-fail- 
ure, the system is switched from failure state 1 into failure state 2. The respective 
mean sojourn times of the system in states 1 and 2 are 1, and ty. When in state 2, 
the system is being renewed. Thus, 1; is the mean switching time and pp the mean 
renewal time. A renewed system immediately starts working, i.e., the system makes a 
transition from state 2 to state 0 with probability 1. This process continues to infinity. 
(For motivation, see example 9.7.) 

(1) Describe the system behavior by a semi-Markov chain and draw the transition 
graph of the embedded discrete-time Markov chain. 


(2) Determine the stationary availability of the system. 


CHAPTER 10 


Martingales 


10.1 DISCRETE-TIME MARTINGALES 


10.1.1 Definition and Examples 


Martingales are important tools for solving prestigious problems in probability theory 
and its applications. Such problems occur in areas as random walks, point processes, 
mathematical statistics, actuarial risk analysis, and mathematics of finance. Heuristic- 
ally, martingales are stochastic models for ‘fair games' in a wider sense, i.e., games, 
in which each side has the same chance to win or to lose. In particular, martingale is 
the French word for that game, in which a gambler doubles her/his bet on every loss 
until he wins (Example 10.6). Martingales were introduced as a special class of sto- 
chastic processes by J. Ville und P. Levy. It was, however, J. L. Doob, who recog- 
nized their large theoretical and practical potential and began with their systematic 
investigation. Martingales as stochastic processes are defined for discrete and contin- 
uous parameter spaces T. Analogously to Markov processes, the terminology discrete- 
time martingales and continuous-time martingales is adopted. The definition of a 
martingale as given in this chapter heavily relies on the concept of the conditional 
mean value of a random variable given values of other random variables or, more 
generally, on the concept of the conditional mean value of a random variable given 
other random variables (see formulas (3.61)—(3.64)). 


Definition 10.1 A stochastic process in discrete time { X0,X1,...} with state space Z, 
which satisfies 
E(\Xn|) <0, n=0,1, 2,..., 

is called a (discrete-time) martingale if for all vectors (x9,X1,...,.Xn) with x; € Z and 
n=0,],... 

E(Xj41|Xn =Xny.X1 =X1,XQ =X0) =Xn- (10.1) 
Under the same assumptions, { X09, X1,...} is a (discrete-time) supermartingale if 

EX ha eis MS Sh = ee (10.2) 
and a (discrete-time) submartingale if 

PX AG Shah 1 =A Ee) Phe (10.3) 

@ 
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If, for instance, the X, are continuous random variables, then, in view of (3.54) 
(page 145), multiplying both sides of the (in-) equalities (10.1) to (10.3) by the joint 
density of the random vector (Xo, X1,....Xn) and integrating over its range yields 


Martingale: E(Xy41) = E(Xn); n=0,1,..., 
Supermartingale: E(Xy41) S$ E(Xn); 1 =0,1,..., 
Submartingale: E(Xn41) 2 E(Xn); n =0,1,.... 


Thus, the trend function of a martingale is constant, 
m= E(Xn)=E(X9); n=0,1,..., (10.4) 
whereas the trend functions of supermartingales (submartingales) are nonincreasing 


(increasing) in time. Despite its constant trend function, a martingale need not be a 
stationary process. Conditions (10.1) to (10.3) are obviously equivalent to 


E(Xnsi —Xn|Xn =Xny. X1 =x1,X0 =x9) = 0, (10.5) 
E(Xns1 —XnlXn =Xn, 5X1 =x 1,Xq =Xx9) <0, (10.6) 
BCG Gg asl, NE Noe) =O (10.7) 


In particular, a stochastic process {Xo,Xj,...} with finite absolute first moments 
E(\Xn|), 2 =0,1,... is a martingale if and only if it satisfies condition (10.5). 


Since (10.1) is assumed to be true for all vectors (x9,x1,...,Xn) with x; € Z, another, 
equivalent definition of a martingale is 

EU Xoh Xai eX 1s 20) Aw OF ECG ay HAG Aas A A) Se-  710:8) 
where the conditional (random) mean values are defined by formula (3.62) with k =n 


and Y= X,,,;. The relations in (10.8) mean that they are true with probability 1. This 
definition applies analogously to super- and submartingales. From (10.8), 


E(Xni2|Xn,--. X1, Xo) = AEX nya Xn 5 X1, Xo) Xn, X1,X0)] 
= EX ai Mie Xo) San 
From this one gets by induction: X0,X1,...,Xn 1s a martingale if and only if for all 
positve integers m 
E(Xnsm Xn, --.X1,X0) =Xn, 
or, equivalently 
E(Xntm|Xn =Xn,..,X1 =X1,X0 =X9)=Xn forall (x0,%1,....X%n) with x; € Z. 


If { Xo, X1,...} is a martingale and X;, is interpreted as the random fortune of a gam- 
bler at time n, then, on condition X; =xy, the conditional mean fortune of the gam- 
bler at time n+1 is also x», and this is independent on the development in time of 
the fortune of the gambler before 1 (fair game with regard both to the gambler and 
its opponent). 
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Example 10.1 (sum martingale) Let { Yo, Y,,...} be a sequence of independent ran- 
dom variables with E(|Yn|) <0 for n=0,1, 2,... and E(Y;)=0 forn = 1,2,.... Then 
the sequence { Xp, Xj,...} defined by Xn = Yo + YY, +---+Yn; n=0,1,... is a mar- 
tingale. The proof is easily established: 
EX Xn = Xs 0X1 =%1,X0 =Xo) 
= E(Xn + Ynvl IX; =Xn, wy X] = x1,X0 =x) 
=Xn + E(V p41) =Xn. 


The sum martingale {X0,Xj,...} can be interpreted as a random walk on the real 
axis: X;, is the position of a particle after its m th jump or, in other words, its position 
at time n. The constant trend function of this martingale is 


m= E(Xn)=E(Yo); n=0,1,.... Oo 


Example 10.2 (product martingale) Let { Yo, Y,...} be a sequence of independent, 
positive random variables with E(Yo)<«, p=E(Y;)<o fori=1,2,..., and 


Xn =YoV---Yn. 
Then, for n = 1,2,..., since X41 =XnVnit, 
EX XnS tes X= 01, Xo= 80) 
SEQ pu Xn Hn HX XH =X) 
= X94 BUY (An HXA Xk 1 He AO HO) 


= Xn E(Vn41) = Xn H 


Thus, {Xo,X1,...} is a supermartingale for u < 1 and a submartingale for > 1. For 
w= 1, the random sequence {Xo, X1,...} is a martingale with trend function 
m= E(Xn)= E(Yo), n=0,1,... 
This martingale seems to be a realistic model for describing the development in time 
of share prices or other risky assets or derivates from these (for terminology see sec- 
tion 11.5.5.2) since, from historical experience, the share at a time point in the future 
is usually proportional to the current price. With this interpretation, Y, — 1 is the rela- 
tive change in the share price over the interval [n, n+ 1] with regard to X;: 
Xnsi~Xn 
Xn 
A further specification of the factors Y; within the product martingale yields an 
exponential type martingale, which is considered in the following example. 


=Yn-1; n=0,1,.... O 


Note For notational convenience, in this chapter (super-, sub-) martingales are sometimes de- 
noted as {X1,Xo,...} instead of {Xo0,X1,...}. 
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Example 10.3 (exponential martingale) A special case of the product martingale is 
the exponential martingale. Let {Z,,Z>,...} be a sequence of independent, identical- 
ly as Z distributed random variables, and 0 be a real number with w(0) = E(e®4) <0, 
A sequence of random variables {Y1, Y,...} be defined as 
Yn =Z, 4+-+°+Zn;3 n=1,2,.... (10.9) 
Then the sequence of random variables {X1,Xo,...} with 
OZ OZ OZn OYn 
Ki aE Se ey Be ey ee 10.1 
n w(0) w(0) w(0) [w(0)]” > nN oF59 ( 0 0) 
0Z. 
is a martingale. This follows immediately from example 10.2, since the factors Oo} 
in (10.10) are independent and have mean value 1: 
(eZ: \ _ E(e®%)_ w(8) _ 
\w0)) ~ w0) ~ w(0) — 


In view of its structure, {X1,X9,...} 1s called an exponential martingale. If a parame- 


1. 


ter ®@ = 0g exists with w(0¢) = 1, then the exponential martingale simplifies to 


{X= e%%1, X, = e%0%2, }. Oo 
Important special cases of the exponential martingale are: 


1) Geometric Random Walk Let Z be a binary random variable with distribution 


_ |+1 with probability p 


Z= 
-1 with probability g¢ ’ 


q=1-p#1/2, 


then {Y1, Yo,...} given by (10.9) can be interpreted as a random walk, which starts at 
Yo = 0, and proceeds with steps of size | to the right or to the left, each with probabil- 
ities p and q, respectively, 0<p<1. The sequence {e%% eon, ..} is called a 
geometric random walk. In this case, 

w(0) = E(e®%) = pe® + qe. 
The geometric random walk is a martingale if 6 = In [q/p] since then w(8) = 1, and 
the corresponding exponential martingale {X),X>,...} has the structure X,, =[q/p]"” 
with trend function m(n) = E(Xn) = 1, n=1,2,.... 
2) Discrete Black-Scholes Model A favorite model for describing the development 
of share prices, which are sampled at discrete time points 1, 2,..., is 

Xn =S1-So°-'Sn, 

with S; = e“' and independent, identically as Z= M1,07) distributed Z;, i= 1,2,.... 
S; has a logarithmic normal distribution with parameters 1 and o? (page 84) and 
mean value E(S;) = ebttor/2, Thus, {X1,X9,...} is amartingale iff p= -o7/2. 
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Example 10.4 (branching process) Consider the Galton-Watson branching process 
as introduced at page 370: Each member of the 1 th generation, n = 0,1,..., produces 
independently of each other a random number Y of offspring with mean value p. Let 
Xyn+, be the random number of offspring produced by the nth generation. Given 


Xn =Xn, the random variable X,,,1 is independent of Xo,X1,...,X;—1. Therefore, 
BG GF Xe Http oe =) = a (10.12) 

Hence, {Xo, X1,...} is a martingale if p= 1, a supermartingale if u <1, and a sub- 

martingale if 1 => 1. Moreover, for any positive p, the sequence {Zo,Z),...} with 


Zn =Xn/u" is a martingale. This can be verified as follows: 


E(Zn41|Zn =Zn,-. Z1 =21, Zo =Zo) 


= Xl 
~ { pt 


nn wees? ul ul? yw? 


Xn = en Xi _ x1 Xo *0) 
H p? 


= ot EX net Xn =Xn,..,X] =x1,Xo =X) 


1 x 
= est Hn = yw Zn. O 


10.1.2 Doob-Type Martingales 


In this section, the concept of a (super-, sub-) martingale {X9,X1,...} | as introduced 
in definition 10.1 is generalized by conditioning with regard to another sequence of 
random variables {Yo, Y,,...}. This, of course, only makes sense if {Yo, Y1,...} is 


somewhat related to {X9,X),...}. The following definition refers to the characteriza- 
tion of (super-, sub-) martingales by properties (10.5) to (10.7). 


Definition 10.2 Let {Xo,X1,...} and {Yo, Y1,...} be two discrete-time stochastic 
processes. If E(|Xn|) < 0 for alln =0,1,..., then the random sequence {Xo, X1,...} is 
a martingale with regard to {Yq, Y1,...} or a Doob-type martingale if for all (n+ 1) 
-dimensional vectors (9,V1,..-,n) with y; elements of the state space of { Yo, Y1,...} 
and for any n= 0,1,..., 

E(Xna1 ~Xal¥n = ns +51 =¥1, Yo =yo) = 0. (10.13) 
Under otherwise the same assumptions, {X0,X1, ...} is a supermartingale with regard 
to {Yo, Yq, ...} if 

E(Xna1 —Xal¥n =yn,-5¥1 =y1,¥o =yo) <9, 
and a submartingale with regard to {Yo,Y1,...} if 

E(Xn41 —Xnl¥n =n, ¥1 =y1, Yo =yo) 2 0. @ 
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Remark Most of the literature on martingales is measure-theoretically based. In this case, the 
definition of a martingale is usually done by means of the concept of a filtration. Loosely 
speaking, a filtration F, contains all the information, which is available about the stochastic 
process {Xq,X1,...} up to time point n. Generally, since with increasing time n the knowledge 
about the process increases, Fg CF; CF, c-:-. Definition 10.1 uses the natural filtration 
F, = {Xo =x0,X1 =%1,...,Xn =Xn} for characterizing a martingale. Thus, the natural filtration 
is simply obtained by observing the process {X0,X1,...} up to time point n. Formally, F, is the 
smallest o-algebra generated by the events 'X; =x;,'i=1,2,...,m; see page 18. A filtration Fy, 
may also contain other information than the natural filtration. In particular, in case of Doob- 
type martingales, our knowledge about the process {Xo,X1,...} at time point is given by the 
filtration Fy, = {Yo =vo, Y1 =¥1,--,Yn =n}. The value of X;, is fully determined by the 
filtration F,,. In measure-theoretic terms, the random variable X,, is measurable with regard to 
F,,. The random variable X,,,;, however, is not measurable with regard to F,. Thus, the mar- 
tingale terminology can be unified by making use of the concept of a filtration: 


A stochastic process {X0,X,... with E(|Xn|) < © for alln =0,1,... is said to be a martingale 
with regard to the sequence of filtrations {F 9, F,,...} if 


E(Xn+1 [Fn)=Xn, n=0,1,.... 


Example 10.5 Let Y; be the random price of a share at time 7 and S; be the amount 
of share an investor holds in the interval [i,i+1); 1=0,1,..., $; 20. Thus, at time 
t=0 the total value of the investor's amount of shares is X9 = YoSo and in the 
interval [i,i+ 1) the investor makes a 'profit' of S;(Yj; — Y;). Hence, the investor's 
total profit up to time t=n is 


Bea OO eV wa, (10.14) 


It makes sense to assume that the investor's choice, what amount of share to hold in 
[n,n +1), does not depend on the profit made in this and later intervals, but only on 
the profits made in the previous intervals. Hence, S; is assumed to be fully determined 
by the Yo, Yy,..., Yn, ie., the S, are constant. Under this assumption, the sequence 
{X1,X,..., } is a supermartingale with regard to {Yo, Y1,...} if {Yo, Yy,...} is a super- 
martingale. This is proved as follows: 


E(Xnsi —Xnl¥n =n Y1 = 15 Yo =o) 
= E(Sn(Yn+1 — Yn)lI¥n =Yn,.-5 Y1 =y1, Yo =yo) 
= SpE Yala Send 1 =y1,Y¥o9 =o) <0. 


The last line makes use of the assumptions that given 'Yy =yn,...,Y1 =y1,Yo0 =yo' 
the share amount S, is a constant and that {Yo, Y1,...} 1s a supermartingale. Hence, 


no matter how well-considered the investor fixes the amount of share to be held in an 
interval, in the longrun she/he cannot expect to make positive profit if the share price 
develops unfavorably. (A supermartingale has a decreasing trend function.) oO 
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Example 10.6 The structure of X, given by (10.14) includes as a special case the 
winnings (losses) development when applying the doubling strategy: Jean bets €1 on 
the first game. If he wins, he gets 1. If he loses, his 'winning' are — 1. Hoping to 
equalize the loss, Jean will bet €2 on the next game. If he wins, he will get €2 and, 
hence, will have made total winnings of € 1. But if he loses he will have total 'winn- 
ings' of €-3, and will bet €4 on the next game and so on. After the first win Jean stops 
gambling. The following table shows the losses (winnings) development of Jean if he 
loses 5 times in a row and then wins: 


game 1 2 3 4 5 6 
result loss loss loss loss loss win 
bet 1 2 4 8 16 32 
‘total winnings' = - 1 -3 -7 -15 -31 +1 


Generally, if Jean loses the first n — 1 games and wins the nth game, then his bets are 
§,;=2"!, i=1,2,...,n, 

and at this time point he quits the play with a win of €+1. Hence, at all future time 

points n+1,+2,... , Jean's total winnings stay constant at level €+1. 


Let Z1,Z>,... be a sequence of independent random variables, identically distributed 
as Z, which indicate whether Jean has won the ith game or not: 


_ | +1 with probability 1/2 (Jean wins), (10.15) 


i |=1 with probability 1/2 (Jean loses). 
In terms of the Z;, the stopping time N of the play is defined as follows: 
N= min {i,Z;=1}. 
ae re} 
Obviously N has the geometrical distribution (2.26) with p = 1/2: 
k 
ppe=P(N=h)= (2) , k=1,2,..., and E(N) =2. 


Let Xn be the total winnings of Jean at time point n. To show that {X1, X2,...} isa 
martingale, equation (10.5) has to be verified: 


E(Xn —Xp—1|Xp-1 = Xy-1y 4 X2 =X2, X] =xX1) =O. (10.16) 
Let N=k. Then the condition 'X,,_) =X,_1,.... X2 =X2, X,; =x,' in (10.16) can be 
deleted, since it is fully characterized by k and n. Three cases have to be considered: 
In<k: Xy=2°Z,4+2!Z,+4---4+2"-'Z, =-1-2--.--2" 1 =1-27, 
(in view of the geometric series (2.16), page 48) 
Qna=k: Xn =2°Z,4+2'Zy 4-0-4292, 42" =-1-2— + - 29142" = 1, 


3)n>k: Xyn=1 for alln=k+1,k+2,.... 
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Therefore, 


E(Xn —Xn-1) = EXn—Xpy1|N > n)P(N> n)+E(Xn—Xp-1|N =n)P(N =n) 


=a & (4) 2g)" 


i=nt+1 
~ At L eo) 1 
=—2 (4) x 2) +2 
n+1 
= gr-i(1) 2+5=0, 


which holds for all n = 1,2,... (letting Xo =0). Thus, condition (10.16) is fulfilled so 
that {X1,X9,...} is a martingale. Hence, on average, Jean cannot make a profit when 
applying the doubling strategy. This theoretical result is not intuitive at all, since with 
increasing the probability py that at least one of the Z; in a series of n games 
assumes value 1 is py = 1 -—2~", and this probability tends to 1 very fast with increas- 
ing n. For being able to maintain the doubling strategy till a win, Jean must, however, 
have a sufficiently large (theoretically, an unlimited) amount of initial capital, since 
each bet size 2' has a positive probability to occur (and the casino must allow arbi- 
trarily large stakes). 'Large' is of course relative in this context, since if Jean starts 
gambling with an initial capital of €1 and his first bet size is one cent, then he can 
maximally maintain 6 bets so that his probability of winning one cent is p¢ ~ 0.984. 


Now let us generalize the doubling strategy by assuming that the Z; are given by 


_ {+1 with probability p 


= 
-1 with probability g ’ 


q=1-p#1/2. (10.17) 
Then, under otherwise the same assumptions, the mean value of X; — X,_1 becomes 
E(Xn-Xna) =-2"! XY pqh! +2" "pq™! 
i=n+1 
=(2q4)"""(p-q), n=1,2,.... 
Thus, {X1,X9,...} is a supermartingale for p< 1/2 and a submartingale for p = 1/2. 


Even if {X1,X,...} is a supermartingale, Jean can make money with the doubling 
strategy with any desired probability if his initial capital is large enough. 


To establish the relationship of the doubling strategy to the previous example, let us 
introduce the notation Y; = Z,+Z2+-:-+Z;.Then Y;—Yj;_; =Z; so that 


E(Y;— Yi) = p-4. 
Thus, the sequence (Y1, Y,...} is a supermartingale if p< 1/2. (For extensions of 
this example see exercises 10.4 and 10.6.) oO 
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Example 10.7 At time ¢=0 a population consists of 2 individuals, one of them is of 
type 1, the other one of type 2. An individual of type & splits into 2 individuals of 
type k, k= 1,2. The splitting time is negligibly small. For all individuals, the time to 
splitting is a finite random variable. These times to splitting need not be identically 
distributed and/or independent. Let f), t,... be the sequence of time points, at which 
splittings occur. {t),¢2,...} is supposed to be a simple point process (page 255). 
{t1,to,...} becomes a marked point process {(t),k1), (to,42),...}, where the marks 
k,; =1 or k; =2 indicate whether an individual of type | or type 2 has split at time ¢;. 
No deaths are assumed to occur so that we consider a special branching process. 

After each splitting event the number of individuals in the population increases by 1. 
Hence, at time tn = tn +0 (i.e., immediately after t, ) the population comprises a total 
number of n+2 individuals, n = 1,2,.... It is assumed that at any time point each 
individual has the same probability to split. Let Y, be the number of individuals of 
type 1 at time point tn, Yo=1. Then {Yo, Yj,...} is a nonhomogeneous Markov 


chain with state space {1,2,...} and transition probabilities 


; : +2-i 
pin) = Pn =i|Yn =) = Serie 
Piii(t) =P nai = i\¥n =) = 5. 


Note that the conditional mean value of Y;,,; on condition Yn = yn is 


E(¥nl¥n =n) =n +0+Dynynlt) + 1 + Pyy yt (1) (10.18) 


yn 
n+2° 


=Ynt 


Now let X7 be the fraction of type 1-individuals in the population at time ty : 
~n+2° 


n 
Then {Xo,Xj1,...} is a martingale with respect to {Yo, Y;,...}. To prove this, it is to 
show that 
EX [Yn =Ynys%1 =1,%0 =o) =Xn- 
Since {Yo, Y;,...} is a Markov chain, the condition 'Yy =yn,..., ¥) =y1, Yo =yo' can 
be replaced with 'Y, = yn.' Hence, by (10.18), 


Yntt 1 
E(Xnsi [Yn =yn) = 5{ Tat Yn=Yn)= Envi [Yn =Yn) 
= Yn_}__Yn _ 
= ot 25 | 25 one 


In the literature, this population model is known as 'Pélya's urn scheme’. Oo 
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Next, under rather strong additional conditions, a criterion is derived, which ensures 
that a Doob-type martingale is a martingale in the sense of definition 10.1. This 
derivation is facilitaed by the introduction of a new concept (Kannan (1979)). 


Definition 10.3 Let { Yo, Y1,...} bea discrete-time Markov chain (not necessarily ho- 
mogeneous) with state space Z = {---,-1,0,+1,---} and transition probabilities 
prOi2Z) =P Hela =e eee ZH 0,1 


A function h(y,n); vy € Z; n=0,1,... is said to be concordant with {Yo, Yj,...} if it 
satisfies for all y e Z 


hyn) = Deez Paly.z) A(z,n+ 1). (10.19) 
e 
Theorem 10.1 Let {Yo, Y1,...} be a discrete-time Markov chain with state space 
Z= {---+,-1,0,+1,---}. 
Then, for any function A(y,n) which is concordant with {Yo, Y),...}, 
a) the sequence of random variables {Xo0,X1,...}. generated by 
Xn =h(Yn,n); n=0,1,... 

is a martingale with regard to {Yo, Y),...}, and 


b) the sequence {X0,Xj1,...} is a martingale. 
Proof a) By the Markov property and the concordance of / with {Yo, Y),...}, 
E(Xns1 —Xn|¥n =¥n + Y1 =V1 Yo =Yo) 
= E(Xns1|¥n =Yny- V1 =V1,¥0 =¥0)-E(Xnl¥n = Yn, ¥1 =V1, Yo = Vo) 
= E(h(Yns1.0 + l)|¥n =yn)- EA n,n)|Yn = yn) 
= D palvn,z) h(z,n+ 1)—h(vn,n) 
zeZ 
=h(yn,n)—h(yn,n) = 9. 
This result shows that {X9,X),...} is a martingale with regard to { Yo, Yy, ...}. 


b) Let, for given x9,x1,...,Xn, the random event A be defined as the ‘martingale 
condition’ A = {Xy =Xn,....,.X1| =x1,X9 =Xg}. Since the X;, are fully determined 


by the Yo, Yj,...,Yn, there exists a set Y of vectors y =(yn,...,¥1,)°0) with property 
that the occurrence of any of the mutually disjoint random events 
Ay = {¥n=Vn, 1 =V1,Y0=yo}, VEY, 


implies the occurrence of event A: 
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A= A 


ypeY : 


Now the martingale property of {Xo,X1,...} is easily established: 


P(As) P(A3) 
Ze: 7 BZ 
BXnild= & E(Xnni|4y ) egy = hm 0S. BD 
=h(yn,n)=Xn. 
Hence, {X9,X1,...} is a martingale according to definition 10.1. a 


Example 10.8 (variance martingale ) Let {Z,,Z>,...} be a sequence of independ- 
ent, integer-valued random variables with probability distributions 


gq”) = P(Zn =i), ie Z={.--,-1,0,+1,---}, 
and numerical parameters E(Z;) = 0 and E(Z?) = o? 21S Ly Qe ies 
With an integer-valued constant zg, a discrete-time Markov chain { Yo, Yy,...} with 
state space Z= {---,-1,0,+1,---} is introduced as Yy =zg + Z] +---+Zn. Then, 
E(¥n)=2z9 for n=0,1,... and Var(Yn) =D) 0? for n=1,2..... 
The function 
2 
h(y,n)=y? LA 6; 
is concordant with { Yo, Y;,...}. To verify this, let pn(v,z) be the transition probabil- 
ities of {Yo, Y,,...} at time m. These transition probabilities are fully determined by 


the probability distribution of Z;,1 : 


1 
Pns2)= PU st =21Yn =) = PZ =2-Y) = qs yz eZ. 


Therefore, 
ah 
2, Pal. 2) hG, n+lj= 2 gz A(z,n+1) 


= 2 go3)(22-EH! 6 ?) = 2 1 ay [@-y+y)? EE 3} | 


zeZ 


+] 1 
7 y+ 2y_ x gy ers x Gey oF 
ZzZE 


a 
t2y-04+1-y?-DE) of =y?-Diy of = AG,n). 
Hence, the function h(y,n) is concordant with {Yo, Y;,...}. Thus, by theorem 10.1, 
the random sequence {X0,X1,...} with X» generated by 
Xn = Y2 -Var(Yn) (10.20) 


is a martingale. It is called variance martingale. oO 
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10.1.3. Martingale Stopping Theorem and Applications 


As pointed out in the beginning of this chapter, martingales are suitable stochastic 
models for fair games, i.e., the chances to win or to lose are equal. If one bets on a 
supermartingale, is it, nevertheless, possible to make money by finishing the game at 
the 'right time’? The decision, when to finish a game can, of course, only be made on 
the past development of the martingale (if no other information is available) and not 
on its future. Hence, a proper time for finishing a game seems to be a stopping time 
N for {X0,X1,...}, where X» is the gambler's net profit after the nth game. Accord- 
ing to definition 4.2 (page 195), a stopping time for {Xo, X1,...} 1s a positive, integer 
-valued random variable N with property that the occurrence of the event 'V=n' is 
fully determined by the random variables Xo, X1,....Xn and, hence, does not depend 
on the X,41,Xy42,... However, the martingale stopping theorem (also called option- 
al stopping theorem or optional sampling theorem) excludes the possibility of winn- 
ing in the longrun if finishing the game is controlled by a stopping time (see also 
examples 10.5 and 10.6). 


Theorem 10.2 (martingale stopping theorem) Let N be a finite stopping time for 
the martingale {Xo, X1,...}, ie. P(N <0) = 1. Then 

E(Xn) = E(X0) (10.21) 
if at least one of the following three conditions is fulfilled: 


1) The stopping time N is bounded, i.e., there exists a finite constant C, so that, with 
probability 1, N< Cj. (Of course, in this case N is finite.) 


2) There exists a finite constant C2 with 
|Xmincvny| < Cz for all n =0, 1,... 


3) E(|Xy]|) < © and jim, E(Xn|N > n) P(N > n) = 0. z= 


Remarks 1) When comparing formulas (10.4) and (10.21), note that in (10.21) Nisa 
random variable. 


2) Example 10.6 shows that (10.21) is not true for all martingales. 


Example 10.9 (Wald's identity) Theorem 10.2 implies Wald's identity (4.74) on con- 
dition that V with E(N) < oo is a stopping time for a sequence of independent, iden- 
tically as Y with E(Y) <oo distributed random variables Y1, Y2,..... To see this, let 


Xn= De; - BW); n= 1,2,... 
By example 10.1, the sequence {X 1, X9,...} is a martingale. Therefore, theorem 10.2 
is applicable with E(X,) =0: 
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B(Xy) = F(Z (¥)- BC) 
5 Fo Y; -N E()) = He Y;) — E(N) E(Y) = 0. 
This proves Wald's identity: 
Aah Y;) = E(N) E(%). (10.22) 
O 


Example 10.10 ( fair game) Let {Z1,Z>,...} be a sequence of independent, identi- 
cally as Z distributed random variables: 


+1 with probability 1/2 
-1 with probability 1/2 © 
Since E(Z;) =0, the sequence {Y,, Y2,...} defined by 

Yn =Z,+Zo4+-:++Zn; n=1,2,... 


is a martingale (example 10.1). Yn is interpreted as the cumulative net profit (loss) of 
a gambler after the nth play if he bets one dollar on each play. The gambler finishes 
the game as soon he has won $a or lost $b. Thus, the game will be finished at time 


N=min{n; Yn=a or Yn =-b}. (10.23) 
Obviously, N is a stopping time for the martingale {Y,, Yz,...}. Note that this martin- 
gale is the symmetric random walk. Since E(N) is finite, by equation (10.21), 
0=£E(Y,) =E(Yn) =a P(Yn =a) + (-b) PY = - 8). 
Combining this relationship with 
P(Yy =a)+P(Yy =-b) = 1 
yields the desired probabilities 


PUy=a)=—, Py =~b)= 2. 


For determining E(N), the variance martingale {X,X,...} with 
Xn = Y3-Var(¥n)=Y3-n 
is used (formula (10.20)). By theorem 10.2, 
B(X1) = B(Xy) = (Yn) ~ BN) = 0. 


Z= 


Therefore, 
E(N) = E(Yx) = a2P(Yy =a) + b?P(Yy =-b). 


Thus, the mean duration of this fair game is 


_,2_6 2a 
E(N)=a ape aus ab. O 
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Example 10.11 (unfair game) Under otherwise the same assumptions as in the pre- 
vious example, let 
_ |+1 with probability p 


é -1 with probability g ’ 


g=1-p#1/2. (10.24) 


Thus, the win and loss probabilities on a play are different. The mean value of Z; is 
E(Z))=p—q=2p-l. 

The martingale {X,,X»,...} is defined as in the previous example: 

Xn = Le (Z;- E(Z))); n= 1,2,.... 
By introducing Yy = Z,+Z2+---+Zn, the random variable X, can be written as 

Xn =Yn-(p—q)n. 
If this martingale is stopped at time N given by (10.23), equation (10.21) yields 
E(Xwy) = En) —(p- MEW) = E(X1) = 9, (10.25) 
or, equivalently, 
aP(Yn =a) +(—b) P(Yn =—6) - (p— 4) E(N) = 0. 

For establishing another equation for the three unknowns 

P(Yn=a), PYn=—b), and E(N), 


the exponential martingale (example 10.3) is used. If 0 = In [g/p] with g = 1 —p, then, 
as pointed out in example 10.3, E(e®4')=1 so that the geometric random walk 
{U,,U),...} given by 


Un= Th e% = oo DiI Zi = e®¥n, y= 1,2,... 

i=1 

is a martingale. Now, again by applying equation (10.21), 
1 = E(U,) = E(Uy) = e® 4 P(Yy =a) +e 8’ P(Yy =-b). (10.26) 
Equations (10.25) and (10.26) together with P(Yy =a)+P(Yy =—b) =1 yield the 
‘hitting probabilities’ 
1-(plqy? ip)? -1 
Py =a)=——P'D | pryy =-p) = PI 

(q/p) —(p/q) (q/p) — (P/q) 

and the mean duration of the game 


E(¥y) 1 [at (plq)’| — bigip" 4) 
EN) = —— = | + es |. 
ae a ae (qip)* - (plq)? 


By letting n=b and z=a+b one gets the result already obtained in example 8.3 
(page 346, formula (8.20)) with elementary methods and without worrying whether 
the assumptions of theorem 10.2 are fulfilled. oO 
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10.2 CONTINUOUS-TIME MARTINGALES 


This section summarizes some results on continuous-time martingales. For simplicity 
and with regard to applications to Brownian motion processes in Chapter 11, their 
parameter space is restricted to T = [0, 0), whereas the state space can be the whole 
real axis Z = (—00, +00) or a subset of it. 


Definition 10.4 A stochastic process {X(4), t= 0} with E(|X()|) < for all ¢>0 is 


called a martingale if for all integers n = 0,1,..., for every sequence fg, ¢1,...,fn with 
O<to <t) <+++<tn, forall vectors (Xn, Xp_1,....X9) with x; € Z and for any t> tn, 


E(X()|X(tn) = Xn, .... X(t) =X1,X(to) =X0) =Xn; (10.27) 
e 


Thus, for predicting the mean value of a martingale at a time ¢, only the last observa- 
tion point before ¢ is relevant. The development of the process before ty contains no 
additional information with respect to its mean value at a time ¢, t > tn. Hence, regard- 
less how large the difference t— ty is, on average no increase/decrease of the process 
{X(t), ¢2 0} can be predicted for the interval [tn, 4]. 


Analogously to the definition of a discrete-time martingale via (10.8), a continuous- 
time martingale can be equivalently defined based on the formulas (3.61) and (3.62) 
at page 147: {X(), 20} is a continuos-time martingale if, with the notation and 
assumptions of theorem 10.2, 


E(X()|X(tn), +++, X(t), X(to)) = X(tn). (10.28) 
This property is frequently written in the more convenient forms 
E(X(0)|X6), vy <5) = X(s), 8 <6 (10.29) 
or 
E(X()-X(s)|X(v), vy < 5) = 0, 8 <7. (10.30) 


{X(f), t20} is a supermartingale (submartingale) if in (10.27)—(10.30) the sign '=' 
is replaced with '<' ('2'). The trend function of a continuous-time martingale is con- 
stant: 


m(t) = E(X()) = m(0). 


Example 10.12 Let {N(#), t = 0} be a homogeneous Poisson process with intensity A, 
X> 0 (page 261). Then its trend function 


m(t) = E(N(d) = At 
is increasing so that this process cannot be a martingale. The process {X(A), t= 0}, 
however, defined by 
X(t) = Nt) — At 
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has trend function m(t) = 0 and is indeed a martingale: For s < ¢, 
E(X() — X(8)|X0), y Ss) 
= E(N(t)— Ms) — Mt—s)|NO), y Ss) 
E(N(O) — Ms)) — A(t -5) = 0. 


The condition 'V(y), y < s' could be deleted, since the homogeneous Poisson process 
has independent increments. (Its development in [0,5] has no influence on its devel- 
opment in (s,f].) Of course, not every stochastic process {X(f), t= 0} of structure 
X(t) = Y(t)- E(Y() is a martingale. O 


Definition 10.5 (stopping time) A random variable L is a stopping time with regard 
to an (arbitrary) stochastic process {X(f), = 0} if for all s>0 the occurrence of the 
random event 'Z <s' is fully determined by the evolvement of this process to time 
point s. Therefore, the occurrence of the random event 'Z <s' is independent of all 
X(t) with t>s. e 


Let '7;;,' denote the indicator function for the occurrence of the event 'Z > ¢:' 


— 1 if L>t occurs, 
aaa otherwise * 


Theorem 10.3 (martingale stopping theorem) If {X(t),t=0} is a continuous-time 
martingale and L a finite stopping time for this martingale, then 

E(X(L)) = E(X(0)) (10.31) 
if one of the following two conditions is fulfilled: 
1) L is bounded, 
2) E(|X(L)|) < 00 and im E(\X(@|Iz34) = 0. | 


The interpretation of this theorem is the same as in case of the martingale stopping 
theorem for discrete-time martingales. For proofs of theorems 10.2 and 10.3 see, for 
instance, Kannan (1979), Grimmett, Stirzaker (2001), or Rolski et al. (1999). 


Example 10.13 As an application of theorem 10.3, a proof of Lundberg's inequality 
(7.85) in actuarial risk analysis is given: Let {R(4, ¢20} be the risk process under 
the assumptions made at page 294: 

RO =x+«t-C, 
where x is the initial capital, « the premium rate, and {C(‘), 20} the compound 


claim size process defined by 


Nt 
CH=0 5 May. diy =0, 


where {M), f= 0} is the homogeneous Poisson process with parameter A = 1/u. 
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The claim sizes M,, Mo, ... are assumed to be independent and identically as M dis- 
tributed random variables with finite mean E(M) and distribution function and density 


BO =PM <b), b() = dB(O/dt, t= 0. 
Let further 
Y() = eTRO and h(r) = E(e™) = [> e" bwdt 
for any positive r with property A(r) <0. Then 
E(Y(t)) =e TK) B(eo 


= ey Efe" COIN) =n) P(N(t) =n) 
= eres) Sony SO om 
i=0 Nn! 


— eT +e) eh tA]. 
Let 
Y(t) C()- a 
X(t) = —X = ec COA TAI, 
O= FU) 
Since {C(f), £20} has independent increments, the process {X(4), 20} has inde- 
pendent increments as well. Hence, for s <¢, since E(X(d)) = 1 for all t= 0, 


E(X()|X(Q), y S$ 8) = E(X(s) + X(0) — X(8) |X), y S 8) 
= X(s) + E(X() —X(s)|X0), y $8) 
= X(s)+ E(XX(O — X(s)) = X(s) + 1-1 = X(s). 
Thus, {X(4), f 20} is a martingale. Now, let 
L= ie {t, RQ) <0}. 
L is obviously a stopping time for the martingale {X(4, 20}. Therefore, for any 
finite z>0, the truncated random variable ZA z= min (ZL, z) is a bounded stopping 


time for {X(t), = 0} (exercise 10.13). Hence, theorem 10.3 is applicable with the 
stopping time L Az: 


E(X(0)) = 1 = E(X(L 42) 
= E(X(L A2|L <z) PL <z) + E(X(LA2|L = z)) PUL =z) 
> E(X(LAz|L <z)P(L <z) 
= E(X(L|L <2) P(L <z) 
SECO MON Le) PE Az). 


The definitions of R(¢) and L imply x+«LZ < C(L). Thus, from the first and the last 
line of this derivation, 
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1 > E(e™ KL) - ALON IT, <z) P(L<z), 


or, equivalently, 


1 > el Eels AGM-DILIL <z) PIL <z). (10.32) 
If the parameter r is chosen in such a way that 
rk — A[A(r)- 1] =0, (10.33) 


then inequality (10.32) simplifies to 
PiL<zj<e™. 
Since this inequality holds for all finite z>0, it follows that 
P(L<aw)<se™. (10.34) 


The probability P(L < ©) is obviously nothing else but the ruin probability p(x). On 
the other hand, in view of A = 1/u, equation (10.33) is equivalent to equation (7.94), 
which defines the Lundberg coefficient 7. When verifying this by partial integration of 


E(e™) = [> e™*b(at, 
note that the assumption /(r) < co implies 
lim e” B(f) = 0. 
t00 


Thus, (10.34) is indeed the Lundberg inequality (7.85) for the ruin probability. O 


10.3 EXERCISES 

10.1) Let Yo, Yj,.... be a sequence of independent random variables, which are iden- 
tically distributed as M(0, 1). Are the stochastic sequences {Xo,X1,...} with 
(1) Xn= Deg Y? (2) Xn= Uo Y} (3) Xn =Zzo |¥;|; 2 =0,1,..., martingales? 
10.2) Let Yo, Y1,... be a sequence of independent random variables with finite mean 
values. Show that the discrete-time stochastic process {X0,X1,...} generated by 

Xn = LEo(Vi- E(Y)) 
is a martingale. 
10.3) Let a discrete-time stochastic process {Xo,X1,...} be defined by 


MeeVee Voto 

where the random variables Y; are independent and have a uniform distribution over 
the interval [0,7]. Under which conditions is {X9,X1,...} | (1) a martingale, (2) a 
submartingale, (3) a supermartingale? 
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10.4) Determine the mean value of the loss immediately before the win when apply- 
ing the doubling strategy, i.e., determine E(Xy_;) (example 10.6). 


10.5) Why is theorem 10.2 not applicable to the sequence of 'winnings' {X1,X9,...}, 
which arises by applying the doubling strategy (example 10.6)? 


10.6) Jean is not happy with the winnings he can make when applying the 'doubling 
strategy’. Hence, under otherwise the same assumptions and notations as in example 
10.6, he triples his bet size after every lost game, starting again with €1. 

(1) What is his winnings when he loses 5 games in a row and wins the 6th one? 

(2) Is {X 1, Xo,...} a martingale? 


10.7) Starting at value 0, the profit of an investor increases per week by $1 with prob- 
bability p, p> 1/2, or decreases per week by one unit with probability 1—p. The 
weekly increments of the investor's profit are assumed to be independent. Let N be the 
random number of weeks until the profit reaches for the first time a positive integer n. 
By means of Wald's equation, determine E(N). 


10.8) Starting at value 0, the fortune of an investor increases per week by $200 with 
probability 3/8, remains constant with probability 3/8, and decreases by $200 with 
probability 2/8. The weekly increments of the investor's fortune are assumed to be 
independent. The investor stops the 'game' as soon as he has made a total fortune of 
$2000 or a loss of $1000, whichever occurs first. 

By using suitable martingales and applying the optional stopping theorem, determine 


(1) the probability p2999 that the investor finishes the 'game' with a profit of $2000, 
(2) the probability p_jg99 that the investor finishes the 'game' with a loss of $1000, 
(3) the mean duration E(N) of the 'game.' 


10.9) Let Xo be uniformly distributed over [0, 7], X, be uniformly distributed over 
[0, Xo], and, generally, X;,,; be uniformly distributed over [0, X;], i=0, 1,.... 


Verify: The sequence {Xo,X1,...} is asupermartingale with E(X;) = Fae k=0,1,.... 
10.10) Let {X1,X,...} be a homogeneous discrete-time Markov chain with state 
space Z = {0,1,...,2} and transition probabilities 


. . J —i Ty as 
pig = Pen =X =)=(")(4) (42): avez 


Show that {X1,Xo,...}. is a martingale. (In Genetics, this martingale is known as the 
Wright-Fisher model without mutation.) 
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10.11) Show that if Z is a stopping time for a stochastic process with discrete or 
continuous time and 0 <z<oo, then 


Laz=min(L,z) 
is a stopping time for this process as well. 
10.12) Let {N(d), t= 0} be a nonhomogeneous Poisson process with intensity func- 
tion A(f) and trend function 

A(f) = ih Mx)dx. 


Check whether the stochastic process {X(4), t 2 0} with X() = N(d) — A(A is a martin- 
gale. 


10.13) Show that every stochastic process {X(4), t € T} satisfying 

E(\XO|) <%, te T, 
which has a constant trend function and independent increments, is a martingale. 
10.14)* The ruin problem described in section 7.2.7 is modified in the following way: 


The risk reserve process {R(f), t => 0} is only observed at the end of each year (or any 
other time unit). The total capital of the insurance company at the end of year n is 


R(n) =x+«n-Deg Mj; n=1,2,..., 
where x is the initial capital, « is the constant premium income a year, and M; is the 


total claim size the insurance company has to cover in year i, Mj =0. The random 
variables M,, M,... are assumed to be independent and identically distributed as 


M=N(u,07) with «> p> 30. 
Let p(x) be the ruin probability of the company: 
p(x) = P(there is ann = 1,2,... so that R(7) < 0). 
Show that 
p(x) <e? (xw)x/o? > 0, 


Hint Define X, =e%%”, n=0,1,..., and select s such that {Xo0,X],...} iS a martingale. 
Apply theorem 10.2 with the stopping times Z = min(n, R(n) <0) and L Az, 0<z<o. 


CHAPTER 11 


Brownian Motion 


11.1 INTRODUCTION 


Tiny organic and inorganic particles when immersed in fluids move randomly along 
zigzag paths. In 1828, the English botanist Robert Brown published a paper in which 
he summarized his observations on this motion and tried to find its physical explana- 
tion. Originally, he was only interested in the behaviour of pollen in liquids in order 
to investigate the fructification process of phanerogams. However, at that time Brown 
could only speculate on the causes of this phenomenon and was at an early stage of 
his research even convinced that he had found an elementary form of life, which is 
common to all particles. Other early explanations refer to attraction and repulsion 
forces between particles, unstable conditions in the fluids in which they are suspend- 
ed, capillary actions, and so on. Although the ceaseless, seemingly chaotic zigzag 
movement of microscopically small particles in fluids had already been observed 
before Brown, it is generally called Brownian motion. 


The first approaches to mathematically modeling the Brownian motion were made by 
L. Bachelier (1900) and _ A. Einstein (1905). Both found the normal distribution to be 
an appropriate model for describing the Brownian motion and gave a physical expla- 
nation of the observed phenomenon: The chaotic movement of sufficiently small par- 
ticles in fluids and in gases is due to the huge number of impacts with the surround- 
ing molecules, even in small time intervals. (Assuming average physical conditions, 
there are about 107! collisions per second between a particle and the surrounding 
molecules in a fluid.) More precisely, Einstein showed that water molecules could 
momentarily form a compact conglomerate which has sufficient energy to move a 
particle, when banging into it. (Note that the tiny particles are 'giants' compared with 
a molecule.) These bunches of molecules would hit the 'giant’ particles from random 
directions at random times, causing its apparently irregular zigzag motion. Strangely, 
Einstein was obviously not aware of the considerable efforts, which had been made 
before him, to understand the phenomenon 'Brownian motion’. N. Wiener (1923), 
better known as the creator of the science of Cybemetics, was the first to present a 
general mathematical treatment of the Brownian motion. He defined and analyzed a 
stochastic process, which has served up till now as a stochastic model of Brownian 
motion. Henceforth, this process is called Brownian motion process or, if no misun- 
derstandings are possible, simply Brownian motion. Frequently, mainly in the German 
literature, this process is also referred to as the Wiener process. 


Nowadays the enormous importance of the Brownian motion is above all due to the 
fact that it is one of the most powerful tools in theory and applications of stochastic 
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modeling, whose role can be compared with that of the normal distribution in proba- 
bility theory. The Brownian motion process is an essential ingredient into stochastic 
calculus, plays a crucial role in mathematics of finance, is basic for defining one of 
the most important classes of Markov processes, the diffusion processes, and for solv- 
ing large sample estimation problems in mathematical statistics. Brownian motion 
has fruitful applications in key disciplines as time series analysis, operations research, 
communication theory (modeling signals and noise), and reliability theory (wear 
modeling, accelerated life testing). This chapter only deals with the one-dimensional 
Brownian motion. 


Definition 11.1 (Brownian motion) A continuous-time stochastic process { B(f), t > 0} 
with state space Z = (—00, +00) is called a (one-dimensional) Brownian motion (pro- 
cess) with parameter o if it has the following properties: 


1) BO) =0. 
2) {B(t), t= 0} has homogeneous and independent increments. 
3) B(t) has a normal distribution with 


E(B(t)) =0 and Var(B(t))=07t, t>0. e 


Condition 1, namely B(0)=0, is only a normalization and as an assumption not 
really necessary. Actually, in what follows situations will arise in which a Brownian 
motion is required to start at B(O) =u #0. In such a case, the process retains prop- 
erty 2, but in property 3 assumption E£(B(f)) = 0 has to be replaced with E(B(d) = u. 
The process {B,(t), £20} with B,(f) =u+ B(t) is called a shifted Brownian motion. 


In view of properties 2 and 3, the increment B(#) — B(s) has a normal distribution with 
mean value 0 and variance o2|t—s| : 


B(t)- B(s) = N(0,02|t-s|), s,t>0. (11.1) 


In applications of the Brownian motion to finance, the parameter o is called volatili- 
ity. 0” is also called variance parameter since 


o? = Var(B(1)). (11.2) 


A 


b(t) oth! id 7 >t 


Figure 11.1 Sample path of the Brownian motion 


Co 
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Standard Brownian Motion Ifo = 1, then {B(4), t = 0} is called a standard Brown- 
ian motion and will be denoted as {S(4), t= 0}. For any Brownian motion with pa- 
rameter o, 


Bi) =o S(). (11.3) 


Laplace Transform Since B(t)=NM(0,0722), because of formula (2.128), page 102, 
the Laplace transform of B(#) is 


B(e-#B)) = ot20°0", (11.4) 


11.2 PROPERTIES OF THE BROWNIAN MOTION 


The first problem, which has to be addressed, is whether there exists a stochastic pro- 
cess having properties | to 3. An affirmative answer was already given by N. Wiener 
in 1923. In what follows, a constructive proof of the existence of the Brownian motion 
is given. This is done by showing that Brownian motion can be represented as the 
limit of a discrete-time random walk, where the size of the steps tends to 0 and the 
number of steps per unit time is speeded up. 


Brownian Motion and Random Walk With respect to the physical interpretation 
of the Brownian motion, it is not surprising that there is a close relationship between 
Brownian motion and the random walk of a particle along the real axis. Modifying 
the random walk described in example 8.1, page 342, it is now assumed that after 
every Af time units the particle jumps Ax length units to the right or to the left, each 
with probability 1/2. Thus, if X(#) is the position of the particle at time ¢, then 
X(f) = (X| +X2+++++Xiyaq) Ax, (11.5) 
where X(0) = Xo = 0 and 
_ |+1 if the 7th jump goes to the right, 
‘ |-1 if the ith jump goes to the left. 
As usual, [t/At] denotes the greatest integer less than or equal to t/At. The random 
variables X; are independent of each other and have probability distribution 
P(X, = 1) = P(X; =-1) = 1/2 ~ with E(X;)=0, Var(X;) = 1. 
Hence, formula (4.56) at page 187, applied to (11.5), yields 
E(X(t))=0, Var(X(0) = (Ax)? [w/ Ad]. 
With a positive constant o, let Ax = o {At . Then, taking the limit as At > 0 in (11.5), 


a stochastic process in continuous time {X(é), f= 0} arises which has trend and var- 
iance function 


E(X(t))=0, Var(X(t)) = 07t. 
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Due to its construction, {X(f),t20} has independent and homogeneous increments. 
Moreover, by the central limit theorem, X(4) has a normal distribution for all ¢> 0. 
Therefore, the stochastic process of the ‘infinitesimal random walk' {X(4,t>0} isa 
Brownian motion. 


Even after Norbert Wiener, many amazing properties of the Brownian motion have 
been detected. Some of them will be considered in this chapter. The following theo- 
rem summarizes key properties of the Brownian motion. 

Theorem 11.1 A Brownian motion {B(4), 20} has the following properties: 

a) {B(t), t= 0} is mean-square continuous. 

b) {B(, t= 0} is a martingale. 

c) {B(A), t= 0} is a Markov process. 

d) {B(t), t= 0} is a Gaussian process. 


Proof a) From (11.1), 
E((B(t) — B(s))*) = Var(B(t) — B(s)) = 0? |t-s]. (11.6) 
Hence, 
lim E| [B(t+ h)- B(A)]* } = lim o?|A| =0. 
tim E( [B+ A) BO}*) = lim 070 
Thus, the limit exists with regard to the convergence in mean-square (page 205). 


b) Since a Brownian motion {B(4), t= 0} has independent increments, for s < ¢, 
E(BO)|B(y), y $s) = E(B(s) + BO) — B(s)| BY),  s)) 
= B(s) + E(B) — B(s)|BQ), y Ss) 
= B(s) + E(B) — B(s)) = B(s) +0 -0 = Bs). 
Therefore, {B(t), t= 0} is a martingale. 


c) Any stochastic process {X(A), t= 0} with independent increments is a Markov pro- 
cess. 


d) Let ¢1,¢2,...,tn be any sequence of real numbers with 0 <t, < tg <--+<tn<o. It 
has to be shown that for all m = 1,2,... the random vector 

(B(t1), B(ta), -.-, B(tn)) 
has an n-dimensional normal distribution. This is an immediate consequence of 
theorem 3.3 (page 149), since each B(t;) can be represented as a sum of independent, 
normally distributed random variables (increments) in the following way: 


B(t)) = BX) + (B@2)— BU) + + BG) — BG-1))s. 1= 2,3,..50- a 
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Theorem 11.2 Let {S(4), t= 0} be the standardized Brownian motion. Then, for any 
constant a #0, the stochastic processes {Y(A), 20} defined as follows are martin- 
gales: 


a) Y(t) = et S(-071/2 (exponential martingale), 
b) Y(t) =S2(t)-t (variance martingale). 


Proof a) Fors <¢, 
E(e%S-#t/2 | Si”), y<s)= E(e%Ss)+ S-S(s)]-071/2 IS0), v <s) 
= 9% S(s)-022/2 E(e4SO-S)1|s(y), y <8) 
= p0S(s)-024/2 Al e#150-S6)]) 
From (11.4) with o = 1, 
A(eaIX-8091) = et e(ts) | 
Hence, 
E(e% 5-072 IS), ys s) = ett S(s)-a.2s/2 ; (1 1.7) 
b) For s<¢, since S(s) and S(#)— S(s) are independent and E(S(x)) = 0 for all x > 0, 
E(S?(#)—t|S(V), y Ss) = E(LS(s) + S(f) — S(s)]? - t|S(Q), v <5) 
= S?(s) + E{2 S(s) [S(f) — S(s)] + [S(f) — S(s)]? - 7|SQ), » <3} 
= $?(s)+0+ Ef[S(t)— S(s)]?} -t 
= $2(s)+(t-s)-t 
= S2(s)—s, 
which proves the assertion. | 


There is an obvious analogy between the exponential and the variance martingale 
defined in theorem 11.2 and corresponding discrete-time martingales considered in 
examples 10.3 and 10.8. 

The relationship (11.7) can be used to generate further martingales: Differentiating 
(11.7) with regard to @ once and twice, respectively, and letting a =0, 'proves' once 
more that {S(), > 0} and {S?(¢)—t, f>0} are martingales. This algorithm produces 
more martingales by differentiating (11.7) k = 3,4, ... times. For instance, when differ- 
entiating (11.7) three and four times, respectively, the resulting martingales are 


{S3()-314S(0), t>0} and {S4(t)—6tS2(t) +327, t= 0}. 
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Properties of the Sample Paths Since a Brownian motion is mean-square continu- 
ous, it is not surprising that its sample paths b= (4) are continuous functions in ¢. 
More exactly, the probability that a sample path of a Brownian motion is continuous 
is equal to 1. In view of this, it may come as a surprise that the sample paths of a 
Brownian motion are nowhere differentiable. This is here not proved either, but it is 
made plausible by using (11.6): For any sample path b = b(t) and any sufficiently 
small, but positive At, the difference 


Ab = b(t+ At)— b() 
is approximately equal to o /Ar . Therefore, 
Ab _ (t+ At) -bM | oVAt_ ig 


At At ~ At At | 


Hence, for At > 0, the difference quotient Ab/At is likely to tend to infinity for any 
nonnegative ¢. Thus, it can be anticipated that the sample paths of a Brownian motion 
are nowhere differentiable; for proofs see, e.g., Kannan (1979). Another example for 
a continuous function, which is nowhere differentiable, is given in Gelbaum and 
Olmstead (1990). 


The variation of a sample path (as well as of any real function) b = b(f) in an interval 
[0,t] with t > 0 is defined as the limit 


iim >: (4 #) -o( SP). (11.8) 


NO pp 1 


A consequence of the non-differentiability of the sample paths is that this limit, no 
matter how small t is, cannot be finite. Hence, any sample path of the Brownian mo- 
tion is of unbounded variation. This property in its turn implies that the 'length' of a 
sample path over the finite interval [0,t], and, hence, over any finite interval [s, ¢] 
with s<t, is infinite. What geometric structure is such a sample path supposed to 
have? The most intuitive explanation is that the sample paths of any Brownian motion 
are strongly dentate (in the sense of the structure of leaves), but this structure must 
continue to the infinitesimal. This explanation corresponds to the physical interpreta- 
tion of the Brownian motion: The numerous and rapid bombardments of particles in 
liquids or gases by the surrounding molecules cannot lead to a smooth sample path. 


Unfortunately, the unbounded variation of the sample paths implies that particles 
move with an infinitely large velocity when dispersed in liquids or gases. Hence, the 
Brownian motion process cannot be a mathematically exact model for describing the 
movement of particles in these media. But it is definitely a good approximation. (For 
modeling the velocity of particles in liquids or gases the Ornstein-Uhlenbeck process 
has been developed; see page 511). However, as pointed out in the introduction, 
nowadays the enormous theoretical and practical importance of the Brownian motion 
within the theory of stochastic processes and their applications goes far beyond its 
being a mathematical model for describing the movement of microscopically small 
particles in liquids or gases. 
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11.3 MULTIDIMENSIONAL AND CONDITIONAL DISTRIBUTIONS 


Let {B(A, 20} be a Brownian motion and f(x) the density of B(f), ¢>0. From 
property 3 of definition 11.1, 


2 


130%, £> 0. (11.9) 


/20t o 


Since the Brownian motion is a Gaussian process, its multidimensional distributions 
are multidimensional normal distributions. To determine the parameters of this distri- 
bution, next the joint density fs(x1,x2) of (B(s), B(O) will be derived. 


Because of the independence of the increments of the Brownian motion and in view 
of B(f) — B(s) having probability density ft-s(x), for small Ax; and Ax2, 


Ts, (%1,%2) Axy Ax = P(x < B(s) $x} + Avy, x2 S$ BY) S x2 + Ar) 


Six) = 


= P(x; < B(s) <x, +Axy, x2 -x) < BA — B(s) < x2 —x1 + Arn — Ary) 
=fs@1) fr-s(x2 —x1) Ax Arg. 


Hence, 
fst 1.%2) =fs@ )fr-s2-¥1)- (11.10) 


(This derivation can easily be made rigorously.) Substituting (11.9) into (11.10) yields 
after some simple algebra 


1 1 2 2 
fulon2)= a — op | tx} —2sx {x2 +8x } (11.11) 
ii 2no /s(t—s) 2075 (t—s) ( : 2) 


Comparing this density with the density of the bivariate normal distribution (3.24) at 
page 131 shows that the random vector {B(s),B()} has a joint normal distribution 
with correlation coefficient 


p=ts/t, O<s<t. 
Therefore, the covariance function of the Brownian motion is 
C(s, t) = Cov (B(s), B(t)) = 97s, 0<s<t. (11.12) 


In view of the independence of the increments of the Brownian motion, it is easier to 
directly determine the covariance function of {B(f), ¢2 0}: ForO0<s<t, 


C(s, 2) = Cov (B(s), B(t)) = Cov (B(s), B(s) + BO — B(s)) 
= Cov(B(s), B(s)) + Cov (B(s), BO — B(s)) 
= Cov(B(s), B(s)) = 07s. 
Since the roles of s and ¢ can be changed, for any s and ¢, 


C(s,t) = 07 min (s, 2). 
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Let 0<s<t. By formula (3.19), page 128, the conditional density of B(s) on con- 

dition B(t) = b is 

fst (X, 5) 
f(b) - 

Substituting (11.9) and (11.11) into (11.13) or by immediately making use of formula 

(3.25) at page 131, 


SR) (X|BO = b) = (11.13) 


2 
fu (l8=6)=—L— exp} —— (x-48) . (11.14) 


pri@-s) 0 26? £(¢-5) 
This is the density of a normally distributed random variable with parameters 
E(B(s)|B(t) = b) = qos Var(B(s)|B(t) = b) = 0? 7 (t—s). (11.15) 
For fixed t, one easily verifies that Var(B(s)|B(f) = b) assumes its maximum at s = t/2. 


Let f1,,t5,..tn(¥1.%2,---,Xn) be the n-dimensional density of the random vector 
(B(t,), B(tz),..., B(tn)) with O0<t) <tp<-+++<th<o. 


From (11.10), by induction, 
St sta y.ctn(% 15% 25 006%) = Sty CV) ftp—t) 2 —¥ 1) tnt On — Xn-1): 
With f(x) given by (11.9), the n-dimensional joint density becomes 
S61 t25.-tn(%15%2, + Xn) (11.16) 
2 
1 | x1, @2-x1)? (Xn-Xn-1)? 
op | | ty + tot} eb ne tn—tn-1 || 


Qn)" 6" fit) Cnt) 


Transforming this density analogously to the two-dimensional case shows that (11.16) 
has the form (3.66), page 148. This proves once more that the Brownian motion is a 
Gaussian process. 

The Brownian motion, as any Gaussian process, is completely determined by its trend 
and covariance function. Actually, since the trend function of a Brownian motion is 
identically zero, the Brownian motion is completely characterized by its covariance 
function. In other words, given o?, there is exactly one Brownian motion process 
with covariance function 


C(s, t) = 07min (s, £). 
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Example 11.1 (Brownian bridge) The Brownian bridge {B(t), t € [0,1]} is a stoch- 


astic process, which is given by the Brownian motion in the interval [0,1] on con- 
dition that B(1) = 0: 


B(t) = B(), O<t< 1; B(1)=0. 
Letting in (11.14) b=0 and t= 1 yields the probability density of B(t): 


2 
fa) = x : 0<t<l. 


1 
— exp } - —_* 
[2ntd-do 207 t(1-2) 
Mean value and variance of B(t) are for 0<t< 1: 
E(B) = 0, 
Var (B(t)) = 07 t(1—2). 


The two-dimensional probability density of the random vector (B(s), B(t)) can be 
obtained from 


_ ftitr,t3 (x1,¥2, 0) 
ST t1,t2(%1,%2) = f,(0) 


with ft; =s, t2 =f, and ¢3 =1. Hence, for 0<s<¢<1, the joint density of the ran- 
dom vector (B(s), B(f)) is 
J(B(s),Bo) © 1-*2) 


1 t 2-29 fe,» les al 
eXP | 7 g2LsG—s)* 1 ts *1%2 7 Go d-y* 


2no2,/s(t—s\(1—-A 


A comparison with (3.24), page 131, shows that the correlation and the covariance 
function of the Brownian bridge are 


s(l-d 
t(1—s)’ 


C(s,t)=02s(1-1), O<s<t<l. 


p(s, 0) = 0<s<r<l, 


The Brownian bridge is a Gaussian process whose trend function is identically 0. 
Hence, it is uniquely determined by its covariance function. 


The geometric Brownian bridge is defined as the stochastic process { Y(A), t= 0} with 
Y(t) = e8, O<1< 1. 


Both the Brownian bridge and the geometric Brownian bridge have some significance 
in modelling stochastically fluctuating parameters in mathematics of finance. O 
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11.4 FIRST PASSAGE TIMES 


By definition, the Brownian motion {B(f), t= 0} starts at B(O) = 0. The random time 
point L(x), at which the process {B(4), t= 0} reaches a given level x for the first time, 
is called the first passage time or the first hitting time of {B(d), t= 0} with respect to 
level x. Since the sample paths of the Brownian motion are continuous functions, L(x) 
is uniquely characterized by B(L(x)) =x and can, therefore, be defined as 


L(x) = min {t, BQ =x}, x € (—c, +00). 


Figure 11.2 Ilustration of the first passage time and the reflection principle 


Next the probability distribution of L(x) is derived on condition x > 0: Application 
of the total probability rule yields 


P(B(t) =x) = P(B(t) = x|L(x) < t) P(L(x) < 0) (11.17) 
+ P(B(t) = x|L(x) > t) P(L(x) > 2). 
The second term on the right-hand side of this formula vanishes, since, by definition 
of the first passage time, 
P(B(t) = x|L(x) >t) =0 
for all ¢> 0. For symmetry reasons and in view of B(L(x)) =x, 
P(B(t) = x|L(x) < ft) = 1/2. (11.18) 


This situation is illustrated in Figure 11.2: Two sample paths of the Brownian motion, 
which coincide up to reaching level x and which after L(x) are mirror symmetric with 
respect to the straight line b(t)=x, have the same chance of occurring. (The 
probability of this event is, nonetheless, zero.) This heuristic argument is known as 
the reflection principle. Formulas (11.9), (11.17), and (11.18) yield 

2 


2 fe 20% du. (11.19) 
x 


V2uto 


F q(t) = PUL(x) $f) = 2 P(B() = x) = 
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For symmetry reasons, the probability distributions of the first passage times L(x) 
and L(-—x) are identical for any x. Therefore, from (11.19), 
foe) u2 
2 fo a du, t>0. (11.20) 
{2nto fx 
The relationship of the probability distribution of Z(x) to the normal distribution 
becomes visible when substituting y? = u2/(o7 A) in the integral of (11.20): 


Fira) = 


Fysid=2|1-0{ HL}], t>0, (11.21) 


where as usual ®(-) is the distribution function of a standard normal random variable. 
Differentiation of (11.20) with respect to t yields the probability density of L(x) : 


|x| x? 
exp , t>0. 11.22 
lon o p3/2 2 ot ( ) 


Mean value E(Z(x)) and variance Var(L(x)) do not exist, i.e., they are infinite. 


SLO) D = 


The probability distribution determined by (11.21) or (11.22), respectively, is a 
special case of the inverse Gaussian distribution (page 513). 
Maximum Let M(¢) be the maximal value the Brownian motion assumes in [0, ¢]: 
M(t) = max {B(s), 0< 5 < th. (11.23) 

The probability distribution of M(0) is obtained as follows: 

1-Fya() = PMO 2 x) = PL) ¥ 9. 
Hence, by (11.21), the distribution function of M(t) is 

Fy -20{_] -1, t>0,x>0, 11.24 
My) ra; ( ) 


The density of M(4) one obtains by differentiation of (11.24) with regard to &: 


fun) = 2—e F120); 13.0, x >0. (11.25) 


J2nto 


As a consequence from (11.24): For all finite x, 

lim PUA) <x)=0. (11.26) 
Therefore, with probability 1, iim M(t) =. The unbounded growth of M(f) is due 
to the linearly increasing variance Var(B(t)) = ot of the Brownian motion as t > ©. 
Contrary to the Brownian motion, its corresponding 'maximum process' {M(a), t> 0} 


has nondecreasing sample paths. This process has applications among others in finan- 
cial modeling and in reliability engineering (accelerated life testing, wear modeling). 
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Example 11.2 A sensor for measuring high temperatures gives an unbiased indication 
of the true temperature during its operating time. At the start, the measurement is ab- 
solutely correct. In the course of time, its accuracy deteriorates, but no systematic 
errors occur. Let B(#) be the random deviation of the temperature indicated by the sen- 
sor at time ¢ from the true temperature. Historical observations justify the assumption 
that {B(A), t = 0} is a Brownian motion with parameter 


o= /Var(BQ)) =0.1 [°C/day]. 
What is the probability that within a year (365 days) B(t) exceeds the critical level 
x =+59C, ie. the sensor reads at least once in a year 5°C high? This probability is 


F1(5)(365) = P(L(S) < 365) 21 of] | 


= 2[1-(2.617)] =0.009. 


If the accuracy of the sensor is allowed to exceed the critical value of +5°C with 
probability 0.05 during its operating time, then the sensor has to be exchanged by a 
new one after a time fgg5 given by P(L(-5) < to.95) = 0.05. According to (11.21), 


to.95 Satisfies equation 


2 re ee = 0.05 
Loa fier, 


5 -1 
————_ = 00.975) = 1.96. 
0.1 1 £0.05 
Thus, f9.95 = 651 [days]. O 


or 


The next example considers a first passage time problem with regard to the Brownian 
motion leaving an interval. 


Example 11.3 Let L(a,5) be the random time at which {B(d), ¢ = 0} for the first time 
hits either value a or value b: 


L(a, b) = min {t, Bit) =a or BX) =b}; b<O0<a. 
Then the probability p,), that {B(d), 20} assumes value a before value b is 
Pap = PUL(@) < L(b)) = (L(G, 6) = L(@)) 
(Figure 11.3) or, equivalently, 
Pap = P(BL(G, 6) = a). 
To determine pa, , note that L(a,b) is a stopping time for {B(é), £20}. In view of 
formula (11.24), E(L(a, 5)) is finite. Hence, theorem 10.3 is applicable and yields 


0 = E(BLG,b))) = apap +b —Pas)- 
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b(t) 
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Figure 11.3 First passage times with regard to an interval 


Therefore, the probability that the Brownian motion hits value a before value is 
[>| 


Pad = a+ 1B] i (11.27) 
For determining the mean value of L(a,b), the martingale {Y(4, t= 0} with 
Y(t) = = B*()-1 
is used (theorem 11.2b) with S(t) = B(A/o). In this case, theorem 10.3 yields 
0= B{ 4 8°(L(a,6)) ~E(L(a,6)). 
Hence, 
E(L(a,b)) = B45 B2(L(a,6))) = 4 [pasa? +1 — pap)? ] 

so that, by (11.27), 

BL) = ~a[b|. (11.28) 


As an application of the situation considered in this example, assume that the total 
profit, which a speculator makes with a certain investment, develops according to a 
Brownian motion process {B(d), t= 0}, i.e., B(A is the cumulative 'profit', the specu- 
lator has achieved at time ¢ (possibly negative). If the speculator stops investing after 
having achieved a profit of a or after having suffered a loss of b, then pg, is the 
probability that he finishes with a profit of a. 


With reference to example 11.2: The probability that the sensor reads 8°C high 
before it reads 4°C low is 4/(8 +4) = 1/3. Or, if in the same example the tolerance 
region for B(t) is [-5 °C, 5C°], then B(t) on average leaves this region for the first 
time at time 

E(L) = 25/0.01 = 2500 [days]. oO 
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11.5 TRANSFORMATIONS OF THE BROWNIAN MOTION 


11.5.1 Identical Transformations 


Transforming the Brownian motion leads to stochastic processes which are important 
in their own right, both from the theoretical and the practical point of view. Some 
transformations again lead to the Brownian motion. Theorem 11.3 compiles three 
transformations of this type (see Lawler (2006)). 


Theorem 11.3 If {S(d, ¢=0} is the standard Brownian motion, then each of the fol- 
lowing stochastic processes is also the standard Brownian motion: 


(1) {X(0), t= 0} with X(0) = cS(t/c?), c>0, 
(2) {¥(d), t= 0} with Y(#) = S(t+h)—S(h), h>0, 


tS(1/#) for t>0 


(3) {Z@, t2 0} with Z() = j 4 eae 


Proof The theorem is proved by verifying properties 1) to 3) of definition 11.1. The 
processes (1) to (3) start at the origin: X(0) = Y(0) = Z(0) =0. Since the Brownian 
motion has independent, normally distributed increments, the processes (1) to (3) 
have the same property. Their trend functions are identically zero. Hence, it remains 
to show that the increments of the processes (1) to (3) are homogeneous. In view of 
(11.1), it suffices to prove that the variances of the increments of the processes (1) to 
(3) in any interval [s,t] with s <¢ are equal to t—s. The following derivations make 
use of E(S?(t)) =t and formula (11.12). 


(1) Var(X(t) - X(s)) = E(LX() — X(s)]) 
= E(X?(t)) —2Cov(X(s), X(t) + E(X2(s)) 
= c* { E(S?(t/c7)) — 2Cov[S(s/c”), S*(t/e?)] + E(S7(s/c?)} 


= (4 S x) 
=? £2545 | <7-s, 
ce ec ce 


(2) Var (Y(t) — Y(s)) = E((S(t +h) — S(s +) ]°) 
= E{S?(t+ h)) —2 Cov[S(s +h) S(t+ h)] + E(S7(s + h)} 
=(t+h)-2(s+h)+(s+h)=t-s. 


(3) Var(Z(t) — Z(s)) = B( [esis ~s S(1/s)] 2) 
= t?E(S?(1/t)) —2 st Cov[S(1/s), S(1/t)] + s2E(S2(1/s)) 
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For any Brownian motion {B(t), t= 0} (see, e.g., Lawler (2006)): 
p( lim 1 B@=0) =1, (11.29) 
toon 


If ¢ is replaced with 1/t, then taking the limit as to is equivalent to taking the 
limit as t— 0. Hence, 


P[ lim B(1/0) = 0) =1. (11.30) 
t>0 


A consequence of (11.29) is that any Brownian motion {B(f), ¢ = 0} crosses the ¢-axis 
with probability 1 at least once in the interval [s, 00), s>0, and, hence, even coun- 
tably infinite times. Since 

{tB(1/d, t= 0} 
is also a Brownian motion, it must have the same property. Therefore, for any s > 0, 
no matter how small s is, a Brownian motion {B(4),t = 0} crosses the t-axis in (0,s] 
countably infinite times with probability 1. 


11.5.2 Reflected Brownian Motion 


A stochastic process {X(a), t> 0} defined by X(A) = |B()| is called a reflected Brown- 
ian motion (reflected at the ¢-axis). Its trend and variance function are 


cox? 5) 
Jxe r20%dx =o ,| <2 , t20, 
0 


m(t) = E(X(d) = 


Var (X(t)) = E(X?(t)) — [E(X() |? = 02 t- 0? 21 =(1-2/n) ot. 


The reflected Brownian motion is a homogeneous Markov process with state space 
Z=[0,0). This can be seen as follows: For 


O<St) <tbn<-++<tn<mw, x, €Z, 


taking into account the Markov property of the Brownian motion and its symmetric 
stochastic evolvement with regard to the t-axis, 


P(X(8) $ yIX(t1) = x1, X(t2) = x2, X(tn) =Xn) 
= P(-y < B(t) < +y|B(t)) = 441, B(tz) = +X, ..., B(tn) = Xn) 
= P(-y $ BY) < +y|Btn) = 4¥n) 
= P(-y < B(t) < +y|B(tn) = xn). 
Hence, for 0 <s<t, the transition probabilities 
P(X(t) < y|X(s) = x) 
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of the reflected Brownian motion are determined by the increment of the Brownian 
motion in [s, ¢] if it starts at time s at state x. Because this increment has an M(x, o°t)- 
distribution with t = t—s, 


_@& x)? 
2071 du, 


P(X) < y|X(s) =x) = 


ie 


or equivalently by 


P(X(t) <y|X(s) =x) = of =), of 22) 1 > x,y20; t=t-s. 


Since the transition probabilities depend on s and ¢ only via t=f-—s, the reflected 
Brownian motion is a homogeneous Markov process. 


11.5.3 Geometric Brownian Motion 


A stochastic process {X(f), #20} with 
X(t) = eFO (11.31) 


is called geometric Brownian motion. 


Unlike the Brownian motion, the sample paths of a geometric Brownian motion can- 
not become negative. Therefore and for analytical convenience, the geometric Brown- 
ian motion is a favourite tool in mathematics of finance for modeling share prices, 
interest rates, and so on. 


According to (11.4), the Laplace transform of B(A) is 


+3 or 


B(a) = E(e*80) =e (11.32) 


Substituting in (11.32) the parameter a with a positive integer n yields the moments 
of X(t): 


1 
EX") = ett. n= 1,2... (11.33) 
In particular, mean value and second moment of X(f) are 
1 
E(X()) = e424, E(X2(9) = e208, (11.34) 
From (11.34) and (1.19): 
Var(X(t)) =e? (e!® = 1). 


Although the trend function of the Brownian motion is constant, the trend function of 
the geometric Brownian motion is increasing: 


m(t) = E(X()) =eF "2, 120. (11.35) 
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11.5.4 Ornstein-Uhlenbeck Process 


As mentioned before, if the Brownian motion process would be the absolutely correct 
model for describing the movements of particles in liquids or gases, the particles had 
to move with an infinitely large velocity. To overcome this unrealistic assumption, 
Uhlenbeck and Ornstein (1930) developed a stochastic process for modeling the ve- 
locity of tiny particles in liquids and gases. Now this process is used as a stochastic 
model in other applications as well, e.g., in finance and population dynamics. 


Definition 11.2 Let {B(A, t 20} be a Brownian motion with parameter o. Then the 
stochastic process {U(f), t € (—c0,+00)} defined by 


U(t) = e-™ B(e2*) (11.36) 


is said to be an Ornstein-Uhlenbeck process with parameters o and a, a > 0. e 


Thus, the stationary Ornstein-Uhlenbeck process arises from the nonstationary Brown- 
ian motion by time transformation and standardization. 


The density of U(A) is easily derived from (11.9): 


Sup) = = er l2o)  _wex<o, 
lO 


Thus, U(t) has a normal distribution with parameters 
E(U(t)) =0, Var(U(t) = 07. 
Hence, the trend function of the Ornstein-Uhlenbeck process is identically zero, and 


U(d) is standard normal if {B(A, t= 0} is the standard Brownian motion. 


Since {B(f), f= 0} is a Gaussian process, the Ornstein-Uhlenbeck process has the 
same property. (This is a corollary from theorem 3.3, page 149.) Hence, the multi- 
dimensional distributions of the Ornstein-Uhlenbeck process are multidimensional 
normal distributions. Moreover, there is a unique correspondence between the sample 
paths of the Brownian motion and the sample paths of the corresponding Ornstein- 
Uhlenbeck process. Thus, the Ornstein-Uhlenbeck process, as the Brownian motion, 
is a Markov process. The covariance function of the Ornstein-Uhlenbeck procss is 


C(s,t)=07e7"%-9), s<t. (11.37) 
This can be seen as follows: For s < ¢, in view of (11.12), 
C(s, t) = Cov(U(s), Ut) = E(U(s) UO) 
= e-US+) E(B(e245 ) B(e20)) 
= cH) Coy(B(e2%), B(e2%)) = e-Als+1) g 22a 


= o2e7 Hlt-), 
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Corollary The Ornstein-Uhlenbeck process is weakly stationary. As a Gaussian pro- 
cess, it is also strongly stationary. 


In contrast to the Brownian motion, the Ornstein-Uhlenbeck process has the follow- 
ing properties: 

1) The increments of the Ornstein-Uhlenbeck process are not independent. 

2) The Ornstein-Uhlenbeck process is mean-square differentiable. 


11.5.5 Brownian Motion with Drift 


11.5.5.1 Definitions and First Passage Times 

Definition 11.3 A stochastic process {D(t), t= 0} is called Brownian motion with 
drift if it has the following properties: 

1) D(O) =0, 

2) {D(), t 2 0} has homogeneous, independent increments, 


3) Every increment D(t)— D(s) has a normal distribution with mean value pu (t—s) 
and variance o“|t—s|. e 


An equivalent definition of the Brownian motion with drift is: 
{D(t), t 2 0} is a Brownian motion with drift if and only if D(¢) has structure 

Did) = ut+ BO, (11.38) 
where {B(t), f> 0} is the Brownian motion with variance parameter o”. The constant 
u is called drift parameter or simply drift. Thus, a Brownian motion with drift arises 
by superimposing a Brownian motion on a deterministic function. This deterministic 
function is a straight line and coincides with the trend function of the Brownian 
motion with drift: 


m(t) = E(D() = pt. 
If properties 2) and 3) are fulfilled, but the process starts at time t=0 at level wu, 
u#0, then the resulting stochastic process {D,(f), t2 0} is called a shifted Brown- 
ian motion with drift. D,(t) has structure 
Dy =u+D(O. 


The one-dimensional density functions of the Brownian motion with drift are 


iced ae 
fow)=—h-e  207t ; -w<x<w, £>0. (11.39) 
V2nt o 
Brownian motion processes with drift are, amongst other applications, used for model- 
ing financial parameters, productivity criteria, cumulative maintenance costs, wear 
modeling as well as for modeling physical noise. Generally speaking, the Brownian 
motion with drift can successfully be applied to modeling situations in which causally 
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A 
d(t) 
m(t)= ut 
0 >t 


Figure 11.4 Sample path of a Brownian motion with positive drift 


linear processes are permanently disturbed by random influences. In view of these 
applications it is not surprising that first passage times of Brownian motions with 
drift play an important role both with respect to theory and practice. 


Let L(x) be the first passage time of { D(t), > 0} with regard to level x. Then, 
L(x) = min {t, Dt) =x}, x © (—00, +00). 


Since every Brownian motion with drift has independent increments and is a Gauss- 
ian process, the following relationship between the probability densities of D(f) and 
L(x) holds (Franz (1977)): 


fio =* fo, x>0, w>0. 
t 


Hence, the probability density of L(x) is 


2 
fio) = ~eou0" | t>0. (11.40) 


x 
———— ex 
J 20 o f/2 P| 207t 


(See also Scheike (1992) for a direct proof of this result.) For symmetry reasons, the 
probability density of the first passage time L(x) of a Brownian motion with drift 
starting at uw can be obtained from (11.40) by replacing x there with x —u. 

The probability distribution given by the density (11.40) is the inverse Gaussian dis- 
tribution with parameters 1, 07, and x. (Replace in (2.89), page 85, the parameters 
a with x7/o” and B with 1/1 to obtain (11.40)). Contrary to the first passage time of 
the Brownian motion, now mean value and variance of L(x) exist: 


E(L(x)) = i Var(L(x)) = u>0. (11.41) 


For 1 =0, the density (11.40) simplifies to the first passage time density (11.20) of 
the Brownian motion. If x <0 and n<0, formula (11.40) yields the density of the 
corresponding first passage time L(x) by substituting |x| and |u| for x and p, respec- 
tively. 
Let 

Fro) = P(L(x) < t) and Fro) =1 — Fry) , 20. 
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Assuming x > 0 and p> 0, integration of (11.40) yields 


7 ou) 2x ( su) 
Fryoo(t) = ®| —— ]| - 4D} -——— ], ¢>0. 11.42 
Lexy oe e Wis ( ) 


If the second term on the right-hand side of (11.42) is sufficiently small, then one ob- 
tains an interesting result: The Birnbaum-Saunders distribution (7.159) at page 330 
as a limit distribution of first passage times of compound renewal processes approxi- 
mately coincides with the inverse Gaussian distribution. 


After some tedious algebra, the Laplace transform of /7()(¢) is seen to be 


B(et09 J = feFrey( dt = exp |- (2 o2s +p? — 1) , (11.43) 
0 


Theorem 11.4 Let / be the absolute maximum of the Brownian motion with drift 
on the positive semiaxis (0, 0): 


M= max D(d). 
te(0,0) 


Then, for any positive x, 


f 
one (11.44) 


1 
rar=| 72 [bl x/0? for u<0. 


Proof In view of (11.26), it is sufficient to prove (11.44) for u <0. The exponential 
martingale {Y(f), t>0} with Ya = 0% St)-071/2 (theorem 11.2) is stopped at time 
L(x). In view of 

D(L(x)) = UL (x) + oS(L(x)) = x, 


the random variable Y(L(x)) can be represented as 


Y(L(x)) = exp {2 [x-pL@)] — 02L(0)/2} = exp {x-[ SF 4+02/2 | LE}. 
Hence, 


E(Y(L(x))) = e* *E( exp {aut a 0.7/2} L(x) L(x) <0) P(L(x) < ©) 


+e#*E( exp {2H ae 2/2 |L@) 


L(x) = 0) P(L(x) = 0). 
Let a > 2|u|/c. Then the second term disappears and theorem 10.3 yields 


1=08*e(exp | aul w/2} 10) 


L(x) < &) P(L(x) < 0). 


Since P(M > x) = P(L(x) < ~), letting a Y 2|p|/o yields the desired result. | 
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Corollary The maximal value, which a Brownian motion with negative drift assum- 
es in (0,+00), has an exponential distribution with parameter 


N= tut (11.45) 
fey 


Example 11.4 (Leaving an interval) Analogously to example 11.3, let (a,b) be the 
first time point at which the Brownian motion with drift {D(O, t= 0} hits either 
value a or value b, b<O<a. 


Let pq» be the probability that {D(@, t= 0} hits level a before level b given p # 0: 
Pa,b = P(L(@) < L(b)) = P(L(G, b) = a). 
For establishing an equation in p,,, the exponential martingale in theorem 11.2 with 
SQ = (DO - pd/o 
is stopped at time L(a,b). From theorem 10.3, 


1=E[exp {3 (D(L(a,b)) = # L(a,6))- eel | 
Equivalently, 
i {exp ig (D(L(a, b)) - a : = |1¢0,0)}) : 
Let a =—2u/c. Then, 


1 = Hes PUas)) = pg pone +(1 — pgp)e 2H? 
Solving this equation for pz» yields 


1 —e72Hb/0? 
= 11.4 

Pa,b e72halo* = e72pb/o* ( .) 

If u <0 and 4 tends to —co in (11.46), then the probability p, , becomes P(L(a) <0), 


which proves once more formula (7.44) with x =a. 
Generally, for a shifted Brownian motion with drift {Dy(0), t= 0}, 
D,(tj)=u+D(), b<u<a, p#0, 


formula (11.46) yields the corresponding probability p,,(u) by replacing there a 
and b witha—u and b—u, respectively (wu can be negative): 


2 2 
e72hulo e 2hb/o 


Pap(u) = P(L(a) < L(b)|Du(0)) = oO 


2 = Pik 
e72halo ~e 2ub/o 
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Geometric Brownian Motion with Drift Let {D(),t=0} be a Brownian motion 
with drift. Then the stochastic process {X(4), t= 0} with 


X(t) = e2O (11.47) 


is called geometric Brownian motion with drift. If the drift p is 0, then {X(0), t = 0} is 
simply the geometric Brownian motion as defined by (11.31). 


The Laplace transform of D(¢) is obtained by multiplying (11.4) by eH: 


Bee) = go twat Ota? (11.48) 
Letting respective o =—1 and a = —2 yields the first and the second moment of X(#): 
E(X()) = ef 4072), B(X2 (9) = e2 #207, (11.49) 
Therefore, by formula (2.62), page 67, 
Var(X(t)) = ef 2H+o*)(et 1), 


Since the inequalities e? >x and D(t) >In x are equivalent, the first passage time 
results obtained for the Brownian motion with drift can immediately be used for 
characterizing the first passage time behavior of the geometric Brownian motion with 
drift with level In x instead of x, x > 0. 


11.5.5.2 Application to Option Pricing 


In financial modeling, Brownian motion and its transformations are used to describe 
the evolvement in time of prices of risky assets as shares, precious metals, crops, and 
combinations of them. Derivatives are financial products, which derive their values 
from one or more risky assets. Options belong to the most popular derivatives. An 
option is a contract, which entitles (but not obliges) its holder (owner) to either buy 
or sell a risky asset at a fixed, predetermined price, called strike price or exercise 
price. A call (put) option gives its holder the right to buy (to sell). An option has a 
finite or an infinite expiration or maturity date, which is determined by the contract. 
An American option can be exercised at any time point to its expiration; a European 
option can only be exercised at the time point of its expiration. So one can expect 
that an American option with finite expiration time t is more expensive than a Euro- 
pean option with the same expiration time if they are based on the same risky assets. 


A basic problem in option trading is: What amount of money should a speculator pay 
to the writer (seller) of an option at the time of signing the contract to become holder 
of the option? Common sense tells that the writer will fix the option price at a level, 
which is somewhat higher than the mean payoff (profit) the speculator will achieve 
by acquiring this option. Hence, the following examples focus on calculating the 
mean (expected) payoff of a holder. For instance, if a European call option has the 
finite expiration date t, strike price xs, and the random price (value) of the under- 
lying risky asset at time t is X(t), then the holder will achieve a positive random 
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payoff of X(t) —xs if X(t) >xs. If X(t) < xs, then the owner will not exercise. This 
would make no financial sense, since in addition to the price the holder had to pay 
for accquiring the option, he/she would suffer a further loss of xs — X(t). In case of a 
European put option, the owner will exercise at time t if X(t) <x s and make a ran- 
dom profit of x; — X(t). Thus, owners of European call or put options will achieve 
the random payoffs (Figure 11.5) 


max(X(t) —xs,0) and max(xs — X(t), 0), 


respectively. But, to emphasize it once more, 'payoff in this context is not the net 
payoff of the holder, the holder's mean net profit is, if the model assumptions are 
correct, on average zero or even negative, since at the time of signing the contract 
he/she had to pay a price for becoming a holder. 


Xs AAA A 
x(t) WN V 
x0 
0 ti, T t 


Figure 11.5 Payoff from a European option 


Figure 11.5 illustrates the situation for a European option with expiration time t. The 
underlying share price (risky asset) starts at the selling time of the option t=0 at 
value xq per unit and ends at value x(t). If a holder owns a European call option, he 
or she would not exercise, but for an owner of an American call option based on the 
same share there had been opportunities for making a profit (maximum payoff at time 
tm). A holder of an European put option would have made a profit of xs — x(t). 


Closely related to options is another kind of derivatives called forward contracts. A forward 
contract is an agreement between two parties, say Tom and Huckleberry. At time t=0, Tom 
declares to buy a risky asset from Huckleberry at time t for a certain price Z(t), called deliv- 
ery price. Huckleberry agrees both with the maturity date t and the delivery price Z(t), and 
they sign the contract. Different to options, Tom must buy at maturity date and Huckeleberry 
must sell at maturity, and no money changes hands between Tom and Huckleberry when sign- 
ing the contract at time ¢=0. If at the time of maturity the price X(t) of the risky security 
exceeds the delivery price Z(t), then Tom will win, otherwise Huckleberry. Determining the 
profit of Tom (Huckleberry) is quite analogous to determining the profit of the holder (price) 
of a European option. Related to forward contracts are futures contracts. They differ from each 
other mainly by administrative issues. 
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Another basic aspect in finance is discounting. Due to interest and inflation rates, the 
value, which a certain amount of money has today, will not be value which the same 
amount of money has tomorrow. In financial calculations, in particular in option pric- 
ing, this phenomenon is taken into account by a discount factor. 

The following examples deal with option pricing under rather simplistic assumptions. 
For detailed and more general expositions, see, e.g., Lin (2006) and Kijima (2013). 


Example 11.5 The price of a share at time ¢ is given by a shifted Brownian motion 
{X(t) = Dxo(t), t= 0} with negative drift i and volatility o = /Var(B(\)) : 


X(t) =xo + D(t) =x9 + ptt BO. (11.50) 


Thus, xg is the initial price of the share: x9 = X(0). Based on this share, Huckleberry 
holds an American call option with strike price 
Xs, Xs 2X0. 

The option has no finite expiry date. Although the price of the share is on average 
decreasing, Huckleberry hopes to profit from random share price fluctuations. He 
makes up his mind to exercise the option at that time point, when the share price for 
the first time reaches value x with x >x,. Therefore, if the holder exercises, his 
payoff will be x—xs (Figure 11.6). By following this policy, Huckleberry's mean 
payoff (gain) is 

G(x) = (x— xs) p(x) + 0-1 —p&)) = & — xs) PO), 
where p(x) is the probability that the share price will ever reach level x. Equivalently, 
p(x) is the probability that the Brownian motion with drift {D(), t= 0} will ever 
reach level x—xq. Since the option has no finite expiration date, this probability is 
given by (11.44) if there x is replaced with x—xg. Hence, Huckleberry's mean pay- 
off is 


G(x) =(x-xs)e* OO) with A =2|p|/o?. (11.51) 


Figure 11.6 Payoff from random share price fluctuations 
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The condition dG(x)/dx =0 yields the optimal value of x: Huckleberry will exercise 
as soon as the share price hits level 


x*=xst1/dr. (11.52) 
The corresponding maximal mean payoff is 


1 


G(x") = cea sh 


(11.53) 


Discounted Payoff Let the constant (risk free) discount rate a be positive. The dis- 
counted payoff from exercising the option at time ¢ on condition that the share has at 
time ¢ price x with x > Xs is 


e (x — xs). 


Since under the policy considered, Huckleberry exercises the option at the random 
time L p(x -—x9), which is the first passage time of {D(d), t= 0} with respect to level 
x—xg), his random discounted payoff is 


e AL D*0) (x — x5) 
so that Huckleberry's mean discounted payoff is 
Galx) = (x-xs)]5 einen) (0) at, (11.54) 
where the density 


SL px—xo)O 


is given by (11.40) with x replaced by x—x . The integral in (11.54) is equal to the 
Laplace transform of f7,(x-x))() with parameter s = a. Thus, from (11.43), 


( [20% +p? -u) ; (11.55) 


The functional structures of the mean undiscounted payoff and the mean discounted 
payoff as given by 11.51) and (11.55), respectively, are identical. Hence the optimal 
parameters with respect to Gg(x) are again given by (11.52) and (11.53) with A re- 


Ga(x) =(x—xs) exp ae 


placed by 
- | [9,? 2_ 
ve4{ 2o0°a+.U n). (11.56) 
Note that minimizing Gq(x) makes sense also for a positive drift parameter p. oO 


Example 11.6 Since for a negative drift parameter 1 the sample paths of a stochas- 
tic process {X(t), t= 0} of structure (11.50) eventually become negative with proba- 
bility one, the share price model (11.50) has only limited application, in particular in 
cases of infinite expiration dates. Hence, in such a situation it seems to be more real- 
istic to assume that the share price at time ¢ is, apart from a constant factor, modeled 
by a geometric Brownian motion with drift: 
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X(t) =xgeP, t>0. 


The other assumptions as well as the formulation of the problem and the notation 
introduced in the previous example remain valid. In particular, the price of the share 
at time ¢ = 0 is again equal to xo. 


The random event 'X(‘) =x' with x > xg is equivalent to 
D() = In(x/xq). 
Therefore, by (11.44), the probability that the share price will ever reach level x is 
p(x) =e Ineebro) = (22 ) i 
If the holder exercises the option as soon as the share price is x, his mean payoff is 
G(x) = (x-x5) (22) (11.57) 


The optimal level x =x* is 


x*= ho as. (11.58) 


To ensure that x* > xs > 0, an additional assumption has to be made: 
A= 2|p|/o? > 1. 


The corresponding maximal mean payoff is 
A-l 7x9\ 
ye ()". (11.59) 


Discounted Payoff The undiscounted payoff x—xs is made when { D(A), t = 0} hits 
level In(x/xq) for the first time, i.e., at time 


Lp (n(x/x0)). 


Using this and processing as in the previous example, the mean discounted payoff is 
seen to be 


auty= (424 


Ga(x) = (¢—xs)(22)’ (11.60) 


with y given by (11.56). The functional forms of the mean undiscounted payoff and 
(11.57) and (11.60) are identical. Hence, the corresponding optimal values x* and 
Ga(x*) are given by (11.58) and (11.59) if in these formulas 2 is replaced with y. 


Note that condition y > 1 is equivalent to 
2(a—p)>0?. 


As in the previous example, a positive drift parameter p need not be excluded. O 
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Example 11.7 (Formula of Black-Scholes-Merton) A European call option is con- 
sidered with strike price x; and expiration date t. The option is based on a risky asset 
the price of which, apart from a constant factor x9, develops according to a geometric 
Brownian motion with drift {X(4), t= 0}: 


X(t) =xgePO, £20. 


The holder will buy if X(t) > xs. Then, given a constant discount factor a, his random 
discounted payoff is 


[e-**(X(t) —x5)]+ = max [e-"(X(t) — xs), 0]. 
Hence, the holder's mean discounted profit will be 
Go(t, H, 6) = E([e (X(t) — x5) ] +). 
In view of D(t) = Mut, 071), 


00 _ 2 
oe peewee aces. 
In(xs/x0) 2M0°T ar 
s/X0 


Substituting u = POET and letting 
oft 


-— UinGes/xo) = wt] 
o/t 
yields 


Galt; 1,0) =x9 et" 1 PE EE Oe 1 } ody. 
V 2m ¢ Cc 


ry 


By substituting in the first integral u=y+o /T one obtains 


2 leo 2 
etOlt ewig, = g27" J ey dy, 
co [Tt 


oes 


Hence, 


Galt; UW, 5) 


oo 0 
=x9 e(u-ato7/2)t Jzles i) ed dy ~xse78t 1 f a ee? 
J 20 c-o Jt /2n © 


= x9 elt 04972)" OG JF —c) —xse7-% (@(—c)). 
At time ¢, the discounted price of the risky security is 


X(t) =e" !X(f) = x9 e WHOS, 
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where {S(f), f= 0} is the standard Brownian motion. By theorem 11.2, the stochastic 
process {Xq(t), >0} is a martingale (exponential martingale) if a—p1=07/2. On 
this condition, the mean discounted payoff of the holder is given by the Formula of 
Black-Scholes-Merton 


Galt, 6) =x9 O(o Jt —c)—x5 eT O(c) (11.61) 


(Black and Scholes (1973), Merton (1973)). In this formula, the influence of the drift 
tt on the price development has been eliminated by the assumption that the 
discounted price of the risky asset develops according to a martingale. The formula 
of Black-Scholes-Merton gives the fair price of the option. This is partially motivat- 
ed by the fact that a martingale has a constant trend function so that, on average, 
holder and writer of this option will neither lose nor win. oO 


11.5.5.3 Application to Maintenance 


In the following example, a functional of the Brownian motion will be used to model 
the random cumulative repair cost X(f) a technical system causes over a time period 
[0,¢]. The following basic situation is considered: A system starts working at time 
t=0. As soon as X(f) reaches level x, the system is replaced by an equivalent new 
one in negligibly small time. The cost of each replacement is c, and after a replace- 
ment a system is 'as good as new'. With regard to cost and length, all replacement 
cycles are independent of each other. Scheduling of replacements aims at minimizing 
the long-run total maintenance cost per unit time, in what follows referred to as main- 
tenance cost rate. 


Policy 1 The system is replaced by a new one as soon as the cumulative repair cost 
X(t) reaches a given positive level x. 


By the renewal reward theorem, i.e., by formula (7.148), page 325, the corresponding 
maintenance cost rate is 


K\()= Hla) (11.62) 


Policy 1 basically needs the same input as the economic lifetime policy, which is in- 
troduced next for the sake of comparisons. 


Policy 2 The system is replaced by a new one after reaching its economic lifetime, 
which is defined as that value t = t*, which minimizes the average maintenance cost 
per unit time K>(t) if the system is always replaced by a new one after t time units. 


Again from the renewal reward theorem, K(t) is given by 


E(X(t))+¢ 
a a 


K2(t) = (11.63) 


In this case a replacement cycle is has the constant length t. 
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Example 11.8 The cost of a replacement is $10000, and the cumulative repair cost 
X(t) [in $] has structure 


X(t) = 0.1 e?, (11.64) 
where {D(t), f> 0} is the Brownian motion with drift parameter p = 0.01 [h7!] and 
variance parameter o? = 0.0064, i.e., in terms of the standard Brownian motion, 


D(t) = 0.01¢+ 0.08 SM. 
Policy 1 The stochastic repair cost process {X(f), f= 0} reaches level x at that time 
point when the Brownian motion with drift {D(4, t= 0} reaches level In 10x: 


XHN=x = Dt) =In 10x. 


Hence, by formula (11.41), the mean value of the first passage time of the process 
{X(t), 2 0} with regard to level x is 


E(Lx(x)) = or 10x = 100In 10x. 


The corresponding maintenance cost rate (11.62) is 


__x+10000 
K1@) = ~T00im 10x 


A limit x =x* minimizing K(x) satisfies the necessary condition dK | (x)/dx =0 or 

x In 10x—x = 10000. 
The unique solution of this equation is (slightly rounded) 

x* = 1192.4 [$] so that K,(x*) = 11.92 [$/A]. (11.65) 

The mean length of an optimum replacement cycle is 

E(Lx(x*)) = 939 [A]. 
Policy 2 Since by (11.49), 

E(eDO) = eltit9722)t — 00.0132 

the corresponding maintenance cost rate (11.63) is 


10.000 + 0.1 e9-0132t 
= 


K(t) = 
The optimal values are 
t*=712 [A] and K>(t*) = 15.74 [$/A]. (11.66) 


Thus, applying policy 1 instead of policy 2 reduces the maintenance cost rate by 
about 25%. 


At first glance, a disadvantage of modelling repair cost processes by functionals of 
the Brownian motion is that these functionals generally are not monotone increasing. 
However, in this example {X(‘), t= 0} hits a level x for the first time at that time 
point when the Brownian motion with drift {D(#), t20} reaches level In 10x. In 
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other words, if a replacement cycle is given by the random interval [0,2 p(y)), then 
the processes {D(f), t= 0} and {M(d), t = 0} hit a positive level y for the first time at 
the same time point, namely L p(y). Hence, replacing {D(4), t= 0} in the cumulative 
cost process {X(f), f= 0} given by (11.64) with the 'maximum process' {M(), t > 0} 
defined by 


M( = max D(), 
O<yst 


would, with regard to policy 1, yield the same the optimal values x* and K (x*) as 
the ones given by formulas (11.65). The sample paths of {M(0), t= 0} are nonde- 
creasing and therefore, principally suitable for modelling the cumulative evolvement 
of repair costs. In the light of this it makes sense, and it is actually necessary to apply 
policy 2 to the cumulative repair cost process 

X(t) =0.1e4O, +20, 
and to compare the results to (11.66). The probability distribution of M(A) is given by 
the distribution of the first passage time L(x) = Lp(x) since P(L(x) < t) = P(M(t) > x)). 
Hence, by (11.40) 


; _@-0.01y)? 
P(M(t) > x)= | ———<e_ 9.0128 ay, 


0 0.08 /2n y!5 


Making use of formula (2.55), page 64, with A(x) =e* yields the corresponding 
maintenance cost rate in the form 
_@=0.01y)? 


oe) a 
10000+0.1 f xe‘) ————<e 9.0128) ya 
0 


0 0.08 ,/2x y!5 
Kx(t|M) = Py 


The optimal values are 
t* =696[h] and K>(t*|M) = 16.112. 


They are quite close to the ones given by (11.66 ). As expected, K>(t*|M) > K2(t*) 
with the respective t*-values. O 


11.5.6 Integrated Brownian Motion 


If {B(), t= 0} is a Brownian motion, then its sample paths b = b(¢) are continuous. 
Hence, the integrals 


b() = J 0) dy 
exist for all sample paths. They are realizations of the random integral 


Ud) =], BO) dy. 
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The stochastic process {U(t), f= 0} is called integrated Brownian motion. This pro- 
cess can be a suitable model for situations, in which the observed sample paths seem 
to be 'smoother' than those of the Brownian motion. Analogously to the definition of 
the Riemann integral, for any n-dimensional vector (¢1, ¢,...,fn) with 


0=t9 <ty <-+++<t,=t and At; = tis} —0;3 i=0,1,2,...,n-1, 
the random integral U(A) is defined as the limit 
n—1 
U(p) = lim by [B(t; + At;) — B(t;)] ai 
n>o i=0 
At;>0 


where passing to the limit refers to mean-square convergence. Therefore, the ran- 
dom variable U(t), being the limit of a sum of normally distributed random variables, 
is itself normally distributed. More generally, by theorem 3.3, page 149, the integrated 
Brownian motion {U(d), t= 0} is a Gaussian process. Hence, the integrated Brown- 
ian motion is uniquely characterized by its trend and covariance function. In view of 


(5, 80)dv) =|), E(BO)) dy = [5,0 dy =0, 


the trend function of the integrated Brownian motion is identically equal to 0: 
m(t) = E(U(H) =0. 
The covariance function 
C(s, t) = Cow(U(s), U(t)) = E(U(s)U), s<t 
of {U(), t= 0} is obtained as follows: 


C(s,) = E {J} BO) dy J) B@) az} 
=E{fi, J) BO) B@dvaz} =I) [) EB) B@)dvdz. 
Since E(B(y), B(z)) = Cov(B(y)B(2)) = 0? min(y,z), it follows that 
C(s,t) = 076, Jj min(y,2) dy dz 
= 07) [} min(Qy,z) dy dz +o?" |} min(y,z) dy dz 


-0°f° [Ji vay 22 dy]de +02[! [8 ydvas 


o? 3 
Thus, C(s,f) = “6 Gt-s)s , SSt. 
Letting s = ¢ yields the variance of U(2): 


Var(U()) = = 8. 
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11.6 EXERCISES 


Note In all exercises, {B(1), t= 0} is the Brownian motion with Var(B(1)) = 07. 


11.1) Verify that the probability density f;(x) of B(, 


___1 —x?/(2.0? 2) 
Si(x) = e* , t>0, 
J 2nt o 


satisfies with a positive constant c the thermal conduction equation 


Of) _ 0? filx) 
ot ox 


11.2) Determine the conditional probability density of B(f) given B(s)=y, O<s<t. 


11.3)* Prove that the stochastic process {B(t), 0 <t< 1} given by B(t) = B(t)—tB(1) 
is the Brownian bridge. 


11.4) Let {B(t), 0 <t< 1} be the Brownian bridge. Prove that the stochastic process 
{S(0),12 0} defined by S(t) =(¢+ 1)B(—5) 


t+1 


is the standard Brownian motion. 
11.5) Determine the probability density of B(s)+ B(A, O<s5<t. 


11.6) Let n be any positive integer. Determine mean value and variance of 
X(n) = BV) + B(2) +--+ + Bn). 
Hint Make use of formula (4.52), page 187. 


11.7) Check whether for any positve t the stochastic process {V(f), t= 0} defined by 
Vd) = B(t+7)-BO 
is weakly stationary. 


11.8) Let X(4) = S3(t) —3tS(‘). Prove that {X(f), t= 0} is a continuous-time martin- 
gale, i.e., show that 


E(X()|X0), vy <s)=X(s), s<t. 


11.9) Show by a counterexample that the Ornstein-Uhlenbeck process does not have 
independent increments. 


11.10) (1) What is the mean value of the first passage time of the reflected Brownian 
motion {|B(t)|,¢> 0} with regard to a positive level x? 


(2) Determine the distribution function of |B(d|. 
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11.11)* Starting from x = 0, a particle makes independent jumps of length 
Ax = o fAt 


to the right or to the left every At time units. The respective probabilities of jumps to 
the right and to the left are 


p=4(1+8 Jar) and 1-p with Jar <]$], o>0. 


Show that as At > 0 the position of the particle at time ¢ is governed by a Brownian 
motion with drift with parameters p ando. 


11.12) Let {D(O, t = 0} be a Brownian motion with drift with paramters u and o. 
Determine F (ow? ds) : 


11.13) Show that forc > 0 and d>0 
P(B(t) <ct+d forall 1>0)=1-e-2¢4/™” 
Hint Make use of formula (11.29). 


11.14) At time t= 0 a speculator acquires an American call option with infinite expi- 
ration time and strike price xs. The price [in $] of the underlying risky security at 
time ¢ is given by X(t)=x9e2. The speculator makes up his mind to exercise this 
option at that time point, when the price of the risky security hits for the first time 
level x with x >xs > Xo. 

(1) What is the speculator's mean discounted payoff G(x) under a constant discount 
rate a? 

(2) What is the speculator's payoff G(x) without discounting? 

In both cases, the cost of acquiring the option is not included in the speculator's payoff. 


11.15) The price of a unit of a share at time point t is X(f) = 10 eP, t>0, where 
{D(d), t= 0} is a Brownian motion process with drift parameter u =—0.01 and vola- 
tility o = 0.1. At time ¢=0 a speculator acquires an option, which gives him the right 
to buy a unit of the share at strike price xs; = 10.5 at any time point in the future, 
independently of the then current market value. It is assumed that this option has no 
expiry date. Although the drift parameter is negative, the investor hopes to profit 
from random fluctuations of the share price. He makes up his mind to exercise the 
option at that time point, when the expected difference between the actual share price 
x and the strike price xs is maximal. 


(1) What is the initial price of a unit of the share? 
(2) Is the share price on average increasing or decreasing? 


(3) Determine the corresponding share price which maximizes the expected profit of 
the speculator. 
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11.16) The value (in $) of a share per unit develops, apart from the constant factor 10, 
according to a geometric Brownian motion {X(), t= 0} given by 

X(t) = 10e8, 0<t< 120, 
where {B(d), t= 0} is the Brownian motion process with volatility o = 0.1. 
At time t=0 a speculator pays $17 for becoming owner of a unit of the share after 
120 [days], irrespective of the then current market value of the share. 
(1) What will be the mean undiscounted profit of the speculator at time point t= 120? 
(2) What is the probability that the investor will lose some money when exercising at 
this time point? 


In both cases, take into account the amount of $17, which the speculator had to pay 
in advance. 


11.17 The value of a share per unit develops according to a geometric Brownian mo- 
tion with drift given by 


X(t) =10 e2 ++0.1 SO t>0, 
where {S(f), t= 0} is the standardized Brownian motion. An investor owns a Europ- 
ean call option with running time t = | [year] and with strike price 
Xs =$12 
on a unit of this share. 


(1) Given a discount rate of a= 0.04, determine the mean discounted profit of the 
holder of the option. 


(2) For what value of the drift parameter 1 do you get the fair price of the option? 
(3) Determine this fair price. 


11.18) The random price X(t) of a risky security per unit at time f is 
X(f) = 5e0.014+B()+0.2|B(0| 


where {B(t), f= 0} is the Brownian motion with volatility 
o = 0.04. 


At time t= 0 a speculator acquires the right to buy the share at price $5.1 at any time 
point in the future, independently of the then current market value; ie., the 
speculator owns an American call option with strike price xs =$5.1 on the share. 
The speculator makes up his mind to exercise the option at that time point, when the 
mean difference between the actual share price x and the strike price is maximal. 


(1) Is the stochastic process {X(4), t 2 0} a geometric Brownian motion with drift? 
(2) Is the share price on average increasing or decreasing? 

(3) Determine the optimal actual share price x =x”. 

(4) What is the probability that the investor will exercise the option? 
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11.19) At time ¢=0 a speculator acquires a European call option with strike price xs 
and finite expiration time t. Thus, the option can only be exercised at time t at price 
xs independently of its market value at time t. The random price X(f) of the underly- 
ing risky security develops according to 


XD) =x9 + DO), 


where {D(t), t20} is the Brownian motion with positive drift parameter and 
volatility o. If X(t)>xs, the speculator will exercise the option. Otherwise, the 
speculator will not exercise. Assume that 


xgtut>3o0/t, 0<t<t. 


(1) What will be the mean undiscounted payoff of the speculator (cost of acquiring 
the option not included)? 


(2) Under otherwise the same assumptions, what is the investor's mean undiscounted 
profit if 


X(t) =x9+ BO) and x9 =X5? 


11.20) Show that 
E(e* UM) a ev 16 
for any constant a, where U(f) is the integrated standard Brownian motion: 


U(d) = J) Sx) dx, 120. 


11.21)* For any fixed positive t, let the stochastic process {V(t), t= 0} be given by 
V(t) = [""* Sx) dx. 
Is {V(t), t= 0} weakly stationary? 


11.22) Let {X(O, t= 0} be the cumulative repair cost process of a system with 
X(t) = 0.01e2, 
where {D(t), t= 0} is a Brownian motion with drift and parameters 
u=0.02 and o” = 0.04. 
The cost of a system replacement by an equivalent new one is c = 4000. 


(1) The system is replaced according to policy 1 (page 522). Determine the optimal 
repair cost limit x* and the corresponding maintenance cost rate K | (x*). 


(2) The system is replaced according to policy 2 (page 522). Determine its economic 
lifetime t* based on the average repair cost development 


E(X(t)) = 0.01 E(e?) 


and the corresponding maintenance cost rate K>(t*). 
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(3) Analogously to example 11.8, apply replacement policy 2 to the cumulative repair 
cost process 


X(t) = 0.01eM0 


with M(t) = ces D(y). Determine the corresponding economic lifetime of the system 
Syst 


and the maintenance cost rate K>(t*|M). Compare to the minimal maintenance cost 
rates determined under (1) and (2). 


For part (3) of this exercise you need computer assistance. 


CHAPTER 12 


Spectral Analysis of Stationary Processes 


12.1 FOUNDATIONS 


Covariance functions of weakly stationary stochastic processes can be represented by 
their spectral densities. These spectral representations of covariance functions have 
proved a useful analytic tool in many technical and physical applications. 


The mathematical treatment of spectral representations and the application of the 
results, particularly in electrotechnics and electronics, is facilitated by introducing 
the concept of a complex stochastic process: {X(4), te R} is a complex stochastic 
process if X(¢) has structure 


XD =VO+iZO, R= (-, +0), 


where { Y(t), te R} and {Z(A, t ¢ R} are two real-valued stochastic processes and 
i= J-1. Thus, the probability distribution of X(f) is given by the joint probability 
distribution of the random vector (Y(4),Z(4), R= (-%,+00). Trend and covariance 
function of {X(4), t= 0} are defined by 


m(t) = E(X(O) = E(Y(H)) + i E(Z(), (12.1) 
Cbs, 1) = Cov(X(s),X0) = E( IM) - EX) XO=EAD)]). 12.2) 
If X(A is real, then (12.1) and (12.2) coincide with (6.2) and (6.3), respectively. 
Notation If z=a+ib andZ=a-—ib, then z and 2 are conjugate complex numbers. The modu- 
lus of z, denoted by |z|, is defined as |z| = /zz = (a2 +b2. 
A complex stochastic process {X(f), t ¢ R} is a second-order process if 


E(|X()|7) < © forall te R. 


Analogously to the definition real-valued weakly stationary stochastic processes (see 
page 232), a second-order complex stochastic process {X(f), t € R} is said to be 
weakly stationary if, with a complex constant m, it has the following properties: 


1)m) =m, 

2) C(s, t) = C(0,t-s). 

In this case, C(s,t) simplifies to a function of one variable: 
C(s, 1) = C(t), 


where t =t—-S. 
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Ergodicity If the complex stochastic process {X(4), t € R} is weakly stationary, then 
one anticipates that, for any of its sample paths x(‘) = y(t) +iz(A), its constant trend 
function m = E(X(f)) can be obtained by 
= 1 f+7 
m= Jim 5 J pxOdt. (12.3) 
This representation of the trend function as an improper integral uses the full informa- 
tion, which is contained in one sample path of the process. 


On the other hand, ifn sample paths of the process 
x1 (0), x2(2), wo Xn(Q) 


are each only scanned at one fixed time point ¢ and if these values are obtained inde- 
pendently of each other, then m can be estimated by 


S piae  e 
m= lim 5 » x;(0). (12.4) 


The equivalence of formulas (12.3) and (12.4) allows a simple physical interpretation: 
the mean of a stationary stochastic process at a given time point is equal to its mean 
over the whole observation period (‘time mean is equal to location mean'). With 
respect to their practical application, this is the most important property of ergodic 
stochastic processes. Besides the representation (12.2), for any sample path x = x(t), 
the covariance function of an ergodic process can be obtained from 


C(x) = him x [oP Lx) — mL Xe — my a. (12.5) 


The exact definition of ergodic stochastic processes cannot be given here. In the 
engineering literature, the ergodicity of stationary processes is frequently simply 
defined by properties (12.3) and (12.5). The application of formula (12.5) is useful if 
the sample path of an ongoing stochastic process is being recorded continuously. The 
estimated value of C(‘) becomes the better the larger the observation period [—7, + 7]. 


Assumptions This chapter deals only with weakly stationary processes. Hence, the attribute 
‘weakly' is generally omitted. Moreover, without of loss of generality, the trend function of all 


processes considered is identically zero. 


For this assumption, the representation (12.2) of the covariance function simplifies to 


C(t) = C(t, t+ 1) = E(X(t) X(t +7T)). (12.6) 
In what follows, Euler's formula is needed: 
et!* =cosx+ sinx. (12.7) 


Solving for sin x and cosx yields 


sinx = Heke), cosx = Sette). (12.8) 
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12.2 PROCESSES WITH DISCRETE SPECTRUM 


In this section the general structure of stationary stochastic processes with discrete 
spectra is developed. Next the simple stochastic process {X(), t € R} with 


X(t) = a(t)X (12.9) 


is considered, where X is a complex random variable and a(t) a complex function 
with a(¢) # constant. For {X(4), t € R} to be stationary, the two conditions 


E(X)=0 and E(|X|*) <0 

are necessary. Moreover, because of (12.5) the function 

E(X() X(t +7) ) = a(t) a(t+D) E(|X17) (12.10) 
is not allowed to depend on ¢. Letting t= 0, this implies 

a(t)a(t) = \a(t)|? = |a|2 = constant. 
Therefore, a(¢) has structure 

a(t) = al ef, (12.11) 

where (f) is a real function. Substituting (12.11) into (12.10) shows that the differ- 


ence @(t+t)—@(t) does not depend on ¢. Thus, if w(t) is assumed to be differen- 
iable, then it satisfies the equation 


d[w(t+ t) — w(A)]/dt = 0, 
or, equivalently, 


ad 
dt 


Hence, w(t) = @f+@, where w and @ are constants. (Note that for proving this result 
it is only necessary to assume the continuity of w(4).) Thus, 


a(t) = |a| e 9), 


w(t) = constant. 


If in (12.9) the random variable X is multiplied by |ale’® and |ale’®X is again denot- 
ed as_X, then the desired result assumes the following form: 
A stochastic process {X(t), t € R} defined by (12.9) is stationary if and only if 
XHj=Xei (12.12) 
with E(X)=0 and E(|X|?) <o. 


Letting s = E(|X| 2); the corresponding covariance function is 
C(t) = se, 


Remark Apart from a constant factor, the parameter s is physically equal to the mean energy 
of the oscillation per unit time (mean power). 
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The real part { Y(0), t ¢ R} of the stochastic process (X(t), ¢ € R} given by (12.12) 
describes a cosine oscillation with random amplitude and phase. Its sample paths 
have, therefore, structure 

y(t) = a cos(@t+ @), 


where a and @ are realizations of possibly dependent random variables A and ®. The 
parameter @ is the circular frequency of the oscillation. 


Generalizing the situation dealt with so far, a linear combination of two stationary 
processes of structure (12.12) is considered: 


X(d) = Xe! Ol + Xo 9! Or, (12.13) 
X, and X are two complex random variables with mean values 0, whereas  , and 
@ are two constant real numbers with w, # @2. The covariance function of the sto- 
chastic process (X(t), t ¢ R} defined by (12.13) is 


C(t, t+) = E(X(t) X04) 


= H([X yell + Xpe72! |] Kj e-Houery) + Xe] ) 
— bere eF@1T 4 XY) Xnei@1-e2)t “ior1)]) 
+ B( [x9 Xpetort +X_Xelor-o1)t 1019 J), 
Thus, {X(), t € R} is stationary if and only if X; and X> are uncorrelated. 


Note Two complex random variables X and Y with mean values 0 are said to be uncorrelated 
if they satisfy the condition E(X Y) = 0 or, equivalently, E(YX) =0, and correlated otherwise. 


In this case, the covariance function of {X(), t € R} is given by 
C(t) = 51 e174. 85 eEO27, (12.14) 
where 
= 2 = 2 
sy = E(\X1|"), 82 = E(\X9|°). 
Generalizing equation (12.13) leads to 
X(O) = Dyer Xpe!#! (12.15) 
with real numbers w, satisfying O; FOZ for j#k; i,j=1,2,...,n. If the X; are un- 


correlated and have mean value 0, then it can be readily shown by induction that the 
process {X(t), t € R} is stationary. Its covariance function is 


C(t) = Die 8pe 2 **, (12.16) 


where 
sy = E(\X,|°); k= 1,2,...5n. 
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In particular, 
n 
CO) = EXON") = % se. (12.17) 
=] 
The oscillation X(f) given by (12.15) is an additive superposition of nm harmonic 


oscillations. Its mean power is equal to the sum of the mean powers of these n 
harmonic oscillations. 


Now let X1,X2,... be a countably infinite sequence of uncorrelated complex random 
variables with E(X,) =0; k= 1, 2, ...; and 


¥ A |%I7) = ¥ sp <o. (12.18) 
k=1 k=1 


Under these assumptions, the equation 


X(t) = > Xpel!, wo; #0, for j#k, (12.19) 
defines a stationary process {X(4), t € R} with covariance function 


Ci) = X spe te. (12.20) 
k=1 


Tawae 
YOU 


Figure 12.1 Sample path of a real narrow-band process 


A 
x(t) 


Ahi aA 


Figure 12.2 Sample path of a real wide-band process for large n 
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The sets {@1,@,...,@n} and {@1,@2,...} are said to be the spectra of the stochastic 
processes defined by (12.15) and (12.19), respectively. If all m, are sufficiently close 
to a single value , then {X(4), ¢ € R} is called a narrow-band process (Figure 12.1); 
otherwise it is called a wide-band process (Figure 12.2). Regarding convergence in 
mean-square, any stationary process {X(¢), t € R} can be sufficiently closely approx- 
imated to a stationary process of structure (12.15) in any finite interval [-T <t<+7]. 


Later it will prove useful to represent the covariance function (12.20) in terms of the 
delta function 6(t). This function is defined as the limit 


Wh for —h/2<t<+h/2 


= li . : 
5) 30 0 elsewhere ee) 
Symbolically, 
co for t=0 
oO 0 elsewhere ” 


The delta-function has a characteristic property, which is sometimes used as its 
definition: For any function f(), 


[72 HO 8(—t0) dt = f(t). (12.22) 
The proof is easily established: If F(#) is the antiderivative of f(#), then 


[2 FO SC- to) dt = J" f+ to) 8 dt 


; +h/2 1 
= lim | Fone to) Lar| 
=-1 | tim F(to + me) = F(to) , lim Fo) —Fe - hi2) 
=F iflto) +flto)} = f(to. 
Using property (12.22), the covariance function (12.20) can be written as 
C(t) = p» suf? ef 8(@— ay) doo. 

Symbolically, 

C(t) = Je! s(@) dao, (12.23) 
where 

s(@) = V1 Sp 8(@ — @,). (12.24) 


The (generalized) function s() is called the spectral density of the stationary 
process. Therefore, C(t) is the Fourier transform of the spectral density of a 
stationary process with discrete spectrum. 
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Real Stationary Processes In contrast to a stochastic process of structure (12.12), a 
stationary process {X(f), ¢ € R} of structure (12.13), i.e., 


X(t) = Xe! Ol + Xpe4 2k, 
can be real. To see this, let 
X1 =4(4+iB), X) =X, =} (4-iB), and | =-o2 =0, 
where A and B are two real random variables with mean values 0. Substituting these 
X, and X into (12.13) yields (compare to Example 6.7, page 235) 
X(t) = Acos wt— Bsinot. 


If A and B are uncorrelated, then, letting s = E(|X1 |?) = E(|X>|7), the covariance 
function of {X(t), t € R} is seen to be C(t) =2scosmt. More generally, it can be 


shown that the process given by (12.15) with n terms defines a real stationary pro- 
cess if n is even and pairs of the X; are complex conjugates. 


12.3. PROCESSES WITH CONTINUOUS SPECTRUM 


12.3.1 Spectral Representation of the Covariance Function 


Let {X(), t € R} be a complex stationary process with covariance function C(t). 
Then there exists a real, nondecreasing, and bounded function S() so that 


C(t) = J e?* dS(@). (12.25) 


(This fundamental relationship is associated with the names of Bochner, Khinchin 
and Wiener; see, e.g., Khinchin (1934)). S(@) is called the spectral function of the 
process. The definition of the covariance function implies that for all r 


C(0) = S(o0) — S(-20) = E(|X()|*) < 0. 


Given C(t), the spectral function is, apart from a constant c, uniquely determined. 
Usually c is selected in such a way that S(—co) = 0. If s(@) = dS(@)/d _ exists, then 


C(t) = J © s(w) do. (12.26) 
The function s(@) is called the spectral density of the process. Since S(@) is nonde- 
creasing and bounded, the spectral density has properties 
s(@) 20, C0)=J"? s(@)do<o. (12.27) 
Conversely, it can be shown that every function s(@) with properties (12.27) is the 
spectral density of a stationary process. 


Remark Frequently the function f(@)=s(@)/2n is referred to as the spectral density. An 
advantage of this representation is that j ee, f(@)dw is the mean power of the oscillation. 
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The set {@, s(@) > 0} with its lower, upper marginal points 


inf @ and sup @ 
oeS oeS 


is said to be the (continuous) spectrum of the process. Its bandwidth w is defined as 


w = sup @- inf o. 
oeS oeS 


Note Here and in what follows mind the difference between w and o. 


Determining the covariance function is generally much simpler than determining the 
spectral density. Hence the inversion of the relationship (12.26) is of importance. It is 
known from the theory of the Fourier integral that this inversion is always possible if 


J [C(O dt <x. (12.28) 


In this case, 
+00 
so) = 55 J et Code. (12.29) 


The intuitive interpretation of assumption (12.28) is that C(t) must sufficiently fast 
converge to 0 as |t| >. The stationary processes occurring in electroengineering 
and communication generally satisfy this condition. Integration of s(@) over the 
interval [@;, @2], ©; <@2, yields 

Oot _ et (0) 


S(@2)— S(@1) = + a cree C(#) dt. (12.30) 


—0 


This formula is also valid if the spectral density does not exist. But in this case the 
additional assumption has to be made that at each point of discontinuity @g of the 
spectral function the following value is assigned to S(@) : 


So) = 5 [S09 +0) - S(@o - 0). 


Note that the delta function O(f) satisfies condition (12.28). If d(f) is substituted for 
C(t) in (12.29), then formula (12.22) yields 


s(o) = 3 [et BN ar = (12.31) 


The formal inversion of this relationship according to (12.26) provides a complex 
representation of the delta function: 


1 [+ ier 
= E 12.32 
63) on (ea eda (12.32) 
The time-discrete analogues to formulas (12.28) and (12.29) are 


+00 +00 ; 
¥ ICH <0, s@=L ¥ coc), (12.33) 
{=-00 2m t=-00 
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Real Stationary Processes Since for any real stationary process C(t) = C(t), the co- 
variance function can be written in the form 


C(t) = [C(t) + C(-1)]/2. 

Substituting (12.26) for C(t) into this equation and using (12.8) yields 

C(t) = 1 cos @T s(@)do. 
Because of cos @t = cos(—ot), this formula can be written as 

C(t) =2 (pee cos @t s(@)do. (12.34) 
Analogously, (12.29) yields the spectral density in the form 

S(@) = + ae cos wt C(t) dt. 
Since s(@) = s(-@), 

s(o) = 1 coset CW dt. (12.35) 


Even in case of real processes it is, however, sometimes more convenient to use the 
formulas (12.26) and (12.29) instead of (12.34) and (12.35), respectively. 


In many applications, the correlation time to is of interest. It is defined by 


o= eG Jo Cat. (12.36) 


If there is |t| < to, then there is a significant correlation between X(f) and X(t+1). 


Tv 


If|t| >to, then the correlation between X(t) and X(t+7) quickly decreases as |t| 
tends to infinity. 


Example 12.1 Let {...,X_1,X0,X1,...} be the discrete white noise (purely random 
sequence) defined at page 246. Its covariance function is 
C(t) = 


2 
o¢ for t=0 
: 12.37 
0 for t=+l, +2.,.... ( ) 
Hence, from (12.29), 

5(@) = 07/2n. 

Thus, the discrete white noise has a constant spectral density. This result is in accord- 
ance with (12.31), since the covariance function of the discrete white noise given by 
(12.37) is equivalent to C(t) = 07 8(t). oO 


Example 12.2 The covariance function of the first-order autoregressive sequence has 
structure (page 249) 
C(t) = call; +=0,41,..., 


where a and c are real constants and |a| <1. The corresponding spectral density is 
obtained from (12.33) as follows: 
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s(@) = — » C(t)e?*® 
y ate ttO4 x ate -iT@ 
a T=—00 


=A S ate ITO 4 y ate =e 


ol =0 
Hence, 


_ ae® 1 
OR On as = 


Example 12.3 The random telegraph signal considered in example 7.3 (page 265) 
has covariance function 


C(t)=ae%", a>0, b>0. (12.38) 
Since condition (12.28) is satisfied, the corresponding spectral density s(@) can be 
obtained from (12.29): 


S(@) = i fe FOt gePltl Gy 


0 oa) 
a2 ki (b-i@)t -(b+io)t 
on ie dt+ J e i 


a 1 1 
=f Str 
Hence, 
ab 
m(@2 +62) 


S(@) = 


The corresponding correlation time is tg = 1/5. 


This result is in line with Figure 12.3. Because of its simple structure, the covariance 
function (12.38) is sometimes even then applied if it only approximately coincides 
with the actual covariance function. oO 


C(t) 


-2 -1 ) 1 2; 


Figure 12.3 Covariance function for example 12.3 
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s(@ 
| | > 
-2 -1 0 1 2 
Figure 12.4 Spectral density for example 12.3 
cus 
- aT 
! >t 
oF 0 +T 
Figure 12.5 Covariance function for example 12.4 
Example 12.4 Let 
T-\t|) for |t]<7 
eae 2e— st 430 726. (12.39) 


0 for |t|>T7 


Figure 12.5 shows the graph of this covariance function. For example, the covariance 
function of the randomly delayed pulse code modulation considered in example 6.8 
(page 236) has this structure (see Figures 6.4 and 6.5). The corresponding spectral 
density one gets by applying (12.29): 


a i 
s(@)=—— | ec! (T- |t\) dt 
2m “py 
+T ; +T ' +T ; 
=A5T f et dt— [ tet? dt— | tet at 
2n | “rp 0 0 
T 
_aJj2T.. 
-a{2 sin oT -2[ reosor di 


Hence, 


(0) = 3 OO. 


Figure 12.6 shows the graph of s(@). O 
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A 
s(@) 


An 2n 4n t 2n 
vi f T 0 = T 7 LP % ie 


Figure 12.6 Spectral density for example 12.4 


The previous examples should not give rise to the conjecture that for every function 
f(t) which tends to zero as |t| > ©, a stationary stochastic process can be found 
with f(t) being its covariance function. A slight modification of (12.39) yields a 
counterexample: 


a(T-72} for |t|<T 


f@= 0 for elt? a>0, T>0. 


If this function is substituted for C(t) in (12.29), then the resulting function s(@) 
does not have properties (12.27). Therefore, f(t) cannot be the covariance function 
of a stationary process. 


Example 12.5 The stochastic processes considered in the examples 6.6 and 6.7 have 
covariance functions of the form 


C(t) =acos@ot. 
Using (12.8), the corresponding spectral density is obtained as follows: 


$2 +00 
= 2. |.ere = lle -iot ( pi@ot _ p-iwot 
5(@) = on Je 7! cos Wotdt = 7 J, et ® (cio ei ) dt 
a +20 +00 
ae es O) dt + au et (@o+@)t gy, 
Applying (8.30) yields a symbolic representation of s(@) (Figure 12.7): 


s(@) = 515(o0 —~@)+5(@9 + @)}. (12.40) 
Making use of (12.22), the corresponding spectral function is seen to be 


0 for @<-wW9, 
S(o)=4) al/2 for -m) <@< 9, 
a for @>@o. 


Thus, the spectral function is piecewise constant (Figure 12.7). O 
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s(@) 


CO 


8 


—O0 0 +0 @ 


Figure 12.7 'Spectral density' and spectral function for example 12.5 


Comment Since in example 12.5 the covariance function does not tend to zero as |t| — 00, the 
condition (12.28), which is necessary for applying (12.29), is not satisfied. This fact motivates 
the occurrence of the delta function in (12.40). Hence, (12.40) as well as (12.24) are symbolic 
representations of the spectral density. The usefulness of such symbolic representations based 
on the delta function will be illustrated later for a heuristic characterization of the white noise. 


If C,(t) and C(t) are the covariance functions of two stationary processes and 
C(t) = C1(t) C2(t), 


then it can be shown that there exists a stationary process with covariance function 
C(t). The following example considers a stationary process, whose covariance 
function C(t) is the product of the covariance functions of the stationary processes 
discussed in examples 12.3 and 12.5. 


Example 12.6 Let C(t) be given by the exponentially damped oscillation: 
C(t) = ae?!" cos wot, (12.41) 


where a>0, b>0, and @g >0. Thus, C(t) satisfies condition (12.28) so that the 
corresponding spectral density can be obtained from (12.29): 


s(@) = q i e~' cos(wt) cos(wof) dt 
0 


+00 
= = i) e'Tcos(@ — wg) t+ cos(@+ Mg) t] dt. 
0 


Therefore, 


Functions of type (12.41) are frequently used to model covariance functions of sta- 
tionary processes (possibly approximately), whose observed covariances periodically 
change their sign as t increases. A practical example for such a stationary process is 
the fading of radio signals, which are recorded by radar. O 


544 APPLIED PROBABILITY AND STOCHASTIC PROCESSES 


12.3.2 White Noise 


In section 6.4.4 (page 246), the discrete white noise or a purely random sequence is 
defined as a sequence {X1,X,...} of independent, identically distributed random 
variables X; with parameters E(X;)=0 and Var(X;)= 0%. There is absolutely no 
problem with this definition. 

Now let us assume that the indices i refer to time points it, i= 1,2,.... What happens 
to the discrete white noise when t tends to zero? Then, even for arbitrarily small t, 
there will be no dependence between X;, and X(;_1); as well as between X;, and 
X(i+1)t- Hence, a continuous-time stochastic process {X(f), t20}, resulting from 
passing to the limit as t—>0, must have the same covariance function as the 
discrete-time white noise (see formula (6.37), page 246): 


o? for t=0, 
0 fort +0, 
2 


C(t) = Cov(X(A), X(t + T) = (12.42) 


or, in terms of the delta-function, if the variance parameter o~ is written as 27 59, 


C(t) = 27 598(t). (12.43) 


One cannot really think of a stochastic process in continuous time having this covari- 
ance function. Imagine {X(#), f= 0} measures the temperature depending on time f¢ at 
a location. Then the temperature at time point ¢ would have no influence at the tem- 
perature one second later. Since there is no dependence between X(f) and X(t+ 7) for 
whatever small |t|, the continuous white noise is frequently said to be the 'most 
random process’. 

By formulas (12.29) and (12.31), the spectral density belonging to (12.43) is 


+00 
s() = x J eT Insp S(f)dt=s0 


so that 
ile S(@)d@ = «, 


Hence, a continuous-time white noise process cannot exist, since its spectral density 
only satisfies the first condition of (12.27). Nevertheless, the concept of white noise 
as an approximate statistical model is of great importance for various phenomena in 
electronics, electrical engineering, communication, time series, econometrics, and 
other disciplines. Its outstanding role in applications can be compared with the one 
of the point mass in mechanics, which also only exists in theory. (A mathematically 
exact definition of the white noise process is, however, possible on the fundament of 
stochastic calculus even if white noise does not exist in the real world.) Here, as a 
working basis, the following explanation of the continuous white noise is given: 


The (continuous) white noise is a real, stationary, continuous-time stochastic 
process with constant spectral density. 
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x(t) 


WALL il Wall. 
PTE 


Figure 12.8 Illustration of a sample path of the white noise (time axis extremely stretched) 


White noise can be thought of as a sequence of extremely sharp pulses, which occur 
after extremely short time intervals, and which have independent, identically distri- 
buted amplitudes with mean 0. The times in which the pulses rise and fall are so short 
that they cannot be registered by measuring instruments. Moreover, the response 
times of measurements are so large that during any response time a huge number of 
pulses occur, which cannot be registered (Figure 12.8). 


Remark The term ‘white noise' is due to a not fully justified comparison with the spectrum of 
the white light. This spectrum actually also has a wide-band structure, but its frequencies are 
not uniformly distributed over the entire bandwidth of the white light. 


A stationary process{X(‘), 20} can be approximately considered a white noise 
process if the covariance between X(f) and X(t+7) tends to 0 extremely fast with 
increasing |t|. For example, if X(t) denotes the the absolute value of the force which 
particles in a liquid are subjected to at time ¢ (causing their Brownian motion), then 
this force arises from the about 102! collisions per second between the particles and 
the surrounding molecules of the liquid (assuming average temperature, pressure and 
particle size). The process {X(f), 20} is known to be weakly stationary with a 
covariance of type 


C(t) =e?" with b> 10!%sec7! 
Hence, X(f) and X(¢+) are practically uncorrelated if 
[sl S107, 


A similar fast drop of the covariance function can be observed if {X(A, t= 0} 
describes the fluctuations of the electromotive force in a conductor, which is caused 
by the thermal movement of electrons. 
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Example 12.7 Let {N(d), t= 0} be a homogeneous Poisson process with intensity 7 
and {X(ft), t= 0} be a shot noise process (see example 6.5, page 229) defined by 


X= > A(t— Ti), 


where the function A(t) quantifies the response of a system to the Poisson events 
arriving at time points 7;. In this example, the system is a vacuum tube, where a 
current impulse is initiated as soon as the cathode emits an electron. If e denotes the 
charge on an electron and if an emitted electron arrives at the anode after z time 
units, then the current impulse induced by an electron is known to be 


St for 0<t<z, 


n=, 7 
“| 0 elsewhere, 


where a is a tube-specific constant. X(f) is, therefore, the total current flowing in the 
tube at time ¢. Now the covariance function of the process {X(A), t= 0} can immedi- 
ately be derived from the covariance function (7.32), page 272. The result is 


(ae)? 3\t-s|  [ts|3 
|-——+——- for t—s|<z 
C(s,t) = 3z 2z 223 | | 


0 elsewhere. 


Since 
lim C(s, 4) = 6(s — 4), 
z>0 


this shot noise process {X(A), t= 0} behaves approximately as white noise if the trans- 
mission time z is sufficiently small. O 


Band-Limited White Noise As already pointed out, a stationary process with con- 
stant spectral density over an unlimited bandwidth cannot exist. A stationary process, 
however, with spectral density 
sq for —w/2<@<+w/2, 
S(@) = ; 
0 otherwise, 


can (Figure 12.9 a). By making use of formulas (12.26) and (12.8), the corresponding 
covariance function is seen to be (Figure 12.9b) 


t+w/2 ; 2 
C(t) = i) e' soda = 2s)Shwur 
—w/2 


The mean power of such a process is proportional to C(0)=sgw, since 
sinx _ 1 


The parameter w is the bandwidth of its spectrum. With increasing w the band-limit- 
ed white noise process behaves increasingly like a white noise. O 
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Figure 12.9 Spectral density and covariance function of the band-limited white noise 


12.4 EXERCISES 


12.1) Define the stochastic process {X(f), t € R} by 
X(t) = A cos(wt+ ®), 


where A and ® are independent random variables with E(A) = 0 and ® is uniformly 


distributed over the interval [0, 27]. 


Check whether the covariance function of the weakly stationary process {X(t 
can be obtained from the limit relation (12.5). 


),te R} 


The covariance function of a slightly more general process has been determined in example 6.6 


at page 235. 
12.2) A weakly stationary, continuous-time process has covariance function 
C(t) = 07e@7 (cos Bt- > sinc). 


Prove that its spectral density is given by 
2 


s(0) = 207 a@ 
t(@? +07 +B? —482@2) 
12.3) A weakly stationary continuous- ra process has er function 
C(t) = otenaltil \ cos Br+> & sinBicl). 
Prove that its spectral density is given by 
207 a(a? +B”) 
n(w? +2 —B2 + 402B2) 


5(@) = 
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12.4) A weakly stationary continuous-time process has covariance function 
C(t)=a™ for a>0, b>0. 


Prove that its spectral density is given by 


s(@) = —4 e 4b. 


2/nb 


12.5) Define a weakly stationary stochastic process {V(t), t => 0} by 
VA = S(t+ 1) - SO, 
where {S(f), t= 0} is the standard Brownian motion process. 


Prove that its spectral density is proportional to 
1—cos@ 
@2 


12.6) A weakly stationary, continuous-time stochastic process has spectral density 


S(@) = > i. 


a, >. 
k=1 @? + Bx 


Prove that its covariance function is given by 


n 
Cit=n d Sk oBelt a> 0. 
k=l Bx 


12.7) A weakly stationary, continuous-time stochastic process has spectral density 


<= {° for |@| <@o or for |@| > 2, oe. 


a~ for 9 < lo] < 29, 
Prove that its covariance function is given by 


C(t) = 2.a? sin(wot) (2eeseut—t) ; 
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between probabilistic concepts and corresponding _ statistical 
approaches to facilitate comprehension. Some important proofs 
and challenging examples and exercises are also included for more 
theoretically interested readers. 
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